dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Apache License 2.0
3.19k stars 278 forks source link

Inference problem about the demo. #118

Open ApolloRay opened 4 months ago

ApolloRay commented 4 months ago

I want to reproduce the same result as the demo. I download image and use the same question ("Explain why this meme is funny, and generate a picture when the weekend coming.") , model generate the wrong answer. 截屏2024-05-22 15 12 07

For evaluation, I use the model_vqa script. I rewrite the qs and image_file_path. I'm not sure where is the problem.

ApolloRay commented 4 months ago

I change the conv_mode from llava_v1 to chatml_direct. It works, but I can't get the same result as the official demo. 截屏2024-05-22 19 36 02