Evaluation Method for Closed-Source Models like GPT4V

AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Other

317 stars 12 forks source link

Evaluation Method for Closed-Source Models like GPT4V #26

Closed JUNJIE99 closed 6 months ago

JUNJIE99 commented 6 months ago

Thanks for your great work!

I understand that for open-source models, you compute the likelihood of an MLLM generating choice content based on a question.

For these closed-source models like GPT4V and Gemini Pro, do we still only evaluate correctness based on their output of options A, B, C, and D?

Thanks in advance for your clarification.

Bohao-Lee commented 6 months ago

Thank you for your interest in our work! For cloused-source model, we just use the generate method to evaluate. For Generate evaluation method, please refer to Evaluation.md for details. For different model's evaluate method, you can refer Leaderboard.