Open SunzeY opened 2 months ago
In our experiments, CFG and Topk value will affect the resulting image significantly. We recommend that CFG value be set bigger than 3.0, and Topk value be set between 2000 and 4000.
I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?
from inference_solver import FlexARInferenceSolver inference_solver = FlexARInferenceSolver( model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni", precision="bf16", target_size=768, ) from PIL import Image q1 = "No edit. <|image|>" images = [Image.open("input.png")] qas = [[q1, None]] generated = inference_solver.generate( images=images, qas=qas, max_gen_len=8192, temperature=1.0, logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200), ) a1 = generated[0] new_image = generated[1][0]
Here is my input image and output image.
Note that the "No edit." prompt is zero-shot as it was not specially used during training
I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?
from inference_solver import FlexARInferenceSolver inference_solver = FlexARInferenceSolver( model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni", precision="bf16", target_size=768, ) from PIL import Image q1 = "No edit. <|image|>" images = [Image.open("input.png")] qas = [[q1, None]] generated = inference_solver.generate( images=images, qas=qas, max_gen_len=8192, temperature=1.0, logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200), ) a1 = generated[0] new_image = generated[1][0]
Here is my input image and output image.
Note that the "No edit." prompt is zero-shot as it was not specially used during training
Does it mean that I have loaded the incorrect model?
I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?
Here is my input image and output image.