dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Apache License 2.0
3.22k stars 280 forks source link

batching giving weird outputs #41

Open mukundkhanna123 opened 7 months ago

mukundkhanna123 commented 7 months ago

Hi I noticed that when doing batch inference with a static prompt like ‘Describe the image’ the model gives wrong output like ‘in detail’, like it is just doing sentence completion. Whereas if I try a more descriptive prompt, where I tell minigemini that it is a ‘prompt generator’ then it goes into sentence completion mode and gives me an okayish response.

However, I have the original image description also, so I tried adding those in the prompt and then asking the model to describe the image, given the information about the image. This works perfectly fine when I just use 1 image at a time.

But when doing batch processing I get completely garbage outputs. To do batch processing, I use padding to get the prompts to the same shape. But this gives me completely garbage output. I do this by changing line 44 in / MiniGemini/minigemini/mm_utils.py to tokenizer(chunk, padding='max_length', max_length=max_len).input_ids for chunk in prompt.split('')

Could you give me any advice on how to do this effectively?