Closed vishaal27 closed 1 year ago
Hello! We added support for accepting interleaved image-text as input for model inference. You can check out the updated code.
For interleaved image-text input, we do not add a system message, but you may try other system messages that make sense and see if they work.
Thank you.
Hey,
I had a question regarding the specific prompt setting for the multi-image results from figure 4 (left side) of the paper. From a brief skim of your
inference.py
script and the EMU modeling code insidemodels
, I think the modifications to make this work would be something like this:Please verify if this is the prompt that you used for the multi-image results in the paper, and let me know if it would work.
The other alternative would be to prepend the
[USER]
token before every new image-text sequence. However, since that would be out-of-distribution with respect to the instruction fine-tuning data format, I am not sure it would work.