Thanks for the great work (mPLUG-OWL3)! I was wondering if the following template is the right chatting format for multi-image inference cuz the readme didn't explicitly mention it. When using the following code, it seems that the model successfully took a lot of images as input but the performance is under my expectation. Please let me know if my template is incorrect (specifically the real_prompt and message formation).
Hi,
Thanks for the great work (mPLUG-OWL3)! I was wondering if the following template is the right chatting format for multi-image inference cuz the readme didn't explicitly mention it. When using the following code, it seems that the model successfully took a lot of images as input but the performance is under my expectation. Please let me know if my template is incorrect (specifically the real_prompt and message formation).
Looking forward to hearing from you. Thanks!