OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
463 stars 35 forks source link

Model performance and evaluation metrics in the OmniMedVQA dataset #21

Open Lycus99 opened 3 months ago

Lycus99 commented 3 months ago

Thanks for your work! After reading the paper OmniMedVQA, I have two questions and sincerely look forward to the answers.

  1. From the paper of MedVInT and RadFM, the dataset used in the radfm model is larger than that of medvint (16M vs. 1.64M). However, the performance of medvint is better than radfm in your paper. Do you further analyze the prediction results of the two models?

  2. QA scores and prefix-based scores are distributed differently across image modalities. Which metric is more useful when selecting a model under a certain modality?