Hi I noticed that when doing batch inference with a static prompt like ‘Describe the image’ the model gives wrong output like ‘in detail’, like it is just doing sentence completion. Whereas if I try a more descriptive prompt, where I tell minigemini that it is a ‘prompt generator’ then it goes into sentence completion mode and gives me an okayish response.
However, I have the original image description also, so I tried adding those in the prompt and then asking the model to describe the image, given the information about the image. This works perfectly fine when I just use 1 image at a time.
But when doing batch processing I get completely garbage outputs. To do batch processing, I use padding to get the prompts to the same shape. But this gives me completely garbage output.
I do this by changing line 44 in / MiniGemini/minigemini/mm_utils.py
to
tokenizer(chunk, padding='max_length', max_length=max_len).input_ids for chunk in prompt.split('')
Could you give me any advice on how to do this effectively?
Hi I noticed that when doing batch inference with a static prompt like ‘Describe the image’ the model gives wrong output like ‘in detail’, like it is just doing sentence completion. Whereas if I try a more descriptive prompt, where I tell minigemini that it is a ‘prompt generator’ then it goes into sentence completion mode and gives me an okayish response.
However, I have the original image description also, so I tried adding those in the prompt and then asking the model to describe the image, given the information about the image. This works perfectly fine when I just use 1 image at a time.
But when doing batch processing I get completely garbage outputs. To do batch processing, I use padding to get the prompts to the same shape. But this gives me completely garbage output. I do this by changing line 44 in / MiniGemini/minigemini/mm_utils.py to tokenizer(chunk, padding='max_length', max_length=max_len).input_ids for chunk in prompt.split('')
Could you give me any advice on how to do this effectively?