Question about evaluation input format

AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Other

276 stars 8 forks source link

In SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py, for InternLM_Xcomposer_VL model all choices are added to model input and choice letters ("A.", "B.", "C.", "D.") are used as labels to calculate loss. While for all other models (instructblip, qwen_vl, llava_v2), in their interface code we can see only the question is added to model input, and the text of each choice is used as labels to calculate loss independently.

I wonder why do you use different input format for different models? Will this have large impact on accuracy?

AILab-CVC / SEED-Bench

Question about evaluation input format #27