Alpha-VLLM / Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
https://arxiv.org/abs/2408.02657
507 stars 22 forks source link

No editing Prompt doesn't help most image reconstruction? #25

Open SunzeY opened 2 months ago

SunzeY commented 2 months ago

I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?

from inference_solver import FlexARInferenceSolver
inference_solver = FlexARInferenceSolver(
    model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni",
    precision="bf16",
    target_size=768,
)
from PIL import Image
q1 = "No edit. <|image|>"
images = [Image.open("input.png")]
qas = [[q1, None]]

generated = inference_solver.generate(
    images=images,
    qas=qas,
    max_gen_len=8192,
    temperature=1.0,
    logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200),
)
a1 = generated[0]
new_image = generated[1][0]

Here is my input image and output image. image image

zhaoshitian commented 2 months ago

In our experiments, CFG and Topk value will affect the resulting image significantly. We recommend that CFG value be set bigger than 3.0, and Topk value be set between 2000 and 4000.

ChrisLiu6 commented 2 months ago

I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?

from inference_solver import FlexARInferenceSolver
inference_solver = FlexARInferenceSolver(
    model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni",
    precision="bf16",
    target_size=768,
)
from PIL import Image
q1 = "No edit. <|image|>"
images = [Image.open("input.png")]
qas = [[q1, None]]

generated = inference_solver.generate(
    images=images,
    qas=qas,
    max_gen_len=8192,
    temperature=1.0,
    logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200),
)
a1 = generated[0]
new_image = generated[1][0]

Here is my input image and output image. image image

Note that the "No edit." prompt is zero-shot as it was not specially used during training

SunzeY commented 2 months ago

I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?

from inference_solver import FlexARInferenceSolver
inference_solver = FlexARInferenceSolver(
    model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni",
    precision="bf16",
    target_size=768,
)
from PIL import Image
q1 = "No edit. <|image|>"
images = [Image.open("input.png")]
qas = [[q1, None]]

generated = inference_solver.generate(
    images=images,
    qas=qas,
    max_gen_len=8192,
    temperature=1.0,
    logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200),
)
a1 = generated[0]
new_image = generated[1][0]

Here is my input image and output image. image image

Note that the "No edit." prompt is zero-shot as it was not specially used during training

Does it mean that I have loaded the incorrect model?