Closed JUNJIE99 closed 6 months ago
Thank you for your interest in our work! For cloused-source model, we just use the generate method to evaluate. For Generate evaluation method, please refer to Evaluation.md for details. For different model's evaluate method, you can refer Leaderboard.
Thanks for your great work!
I understand that for open-source models, you compute the likelihood of an MLLM generating choice content based on a question.
For these closed-source models like GPT4V and Gemini Pro, do we still only evaluate correctness based on their output of options A, B, C, and D?
Thanks in advance for your clarification.