X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

Support VQA with Multiple Choices #147

Open WesleyHsieh0806 opened 1 year ago

WesleyHsieh0806 commented 1 year ago

Hi, Thanks for your great work.

I am curious about the analysis of mPLUG-Owl on VQA tasks with multiple choices. Specifically, I am looking for an API that takes in an image, prompt, and a list of choices (List[str]) and outputs the choices with the highest probability.

Like

prediction = model.predict_answer({'image': image, 
                               'prompt': prompt, 'choices': ["answer1", "answer2", "answer3"]})
print(prediction)  # 0 (answer1)

Is there a good way to achieve the same functionality?

MAGAer13 commented 11 months ago

You can just add your options into prompt, and use as an open-generation style. We will release mPLUG-Owl-2 recently, which is a better foundation model, and it can better support multiple choice questions.