AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
Other
310 stars 11 forks source link

Support for evaluation of other VLM models like MiniGPT-4, mPLUG-Owl, Llava, and VPGTrans #8

Open WesleyHsieh0806 opened 1 year ago

WesleyHsieh0806 commented 1 year ago

Hi, thanks for your great work.

I am a graduate researcher from CMU, and our curiosity lies in analyzing specific VLM models with regards to particular question types. Would it be possible for you to share the source code or interface for all the models listed on the leaderboard? In particular, we are keen on understanding the behavior of the following models: MiniGPT-4, mPLUG-Owl, Llava, and VPGTrans.

geyuying commented 1 year ago

Hi, Thank you for your interest in our benchmark.

We use official implementations of all the models listed on the leaderboard, and you can refer to their official repo.

WesleyHsieh0806 commented 1 year ago

@geyuying As the APIs of these models are pretty different from InstructBLIP, could you provide any instructions/examples on how to calculate the log likelihood of each choice using these models?