Open fansticOne opened 8 months ago
You can pass a list of images and place the same number of "<|image|>" in your prompt.
I pass a list of images, say 2 images, and modify the prompt. The image_tensor after preprocess has batch size of 2, while the input_ids has batch size of 1,then I run model.generate(), I do get a result, however the result is wrong. Do I misunderstand?
I pass a list of images, say 2 images, and modify the prompt. The image_tensor after preprocess has batch size of 2, while the input_ids has batch size of 1,then I run model.generate(), I do get a result, however the result is wrong. Do I misunderstand?
Could you provide an example and the incorrect response generated by the owl? Btw, the owl has not been trained on SFT data that includes multiple images. Therefore, it is reasonable to expect that it might fail in some cases.
Here are the two images I passed the prompt is 'USER: <|image|><|image|>{}\nAnswer the question using a single word or phrase. ASSISTANT:'.format('Does the dog in the first picture have same color with the dog in the second picture?') the response generated by the owl is 'Yes'
In vqa task, I want to input two images and ask a question about the two images,how to realize it?