Evaluation Script For The Other Benchmarks

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

https://www.modelscope.cn/studios/damo/mPLUG-Owl

MIT License

2.25k stars 171 forks source link

Evaluation Script For The Other Benchmarks #185

Open chancharikmitra opened 10 months ago

chancharikmitra commented 10 months ago

Hello! First of all, this is really fascinating work. Thanks for the contribution.

I wanted to reach out and ask if you could share the evaluation scripts for mPLUG-OWL2 you used for benchmarks shown in the main figure (e.g. SEEDBench, QBench, etc.). It would also be great if you could provide (or include in the script) any specific prompting you might have done for your zero-shot evaluation on those datasets.

MAGAer13 commented 10 months ago

Hi, for the SEEDBench and Q-Bench, we treat this as the Multiple Choice Question for open-generation. You can refer to the MMBench (which is also a MC benchmark) for reference, especially for the prompt reference.