Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
463
stars
35
forks
source link
Model performance and evaluation metrics in the OmniMedVQA dataset #21
Thanks for your work!
After reading the paper OmniMedVQA, I have two questions and sincerely look forward to the answers.
From the paper of MedVInT and RadFM, the dataset used in the radfm model is larger than that of medvint (16M vs. 1.64M). However, the performance of medvint is better than radfm in your paper. Do you further analyze the prediction results of the two models?
QA scores and prefix-based scores are distributed differently across image modalities. Which metric is more useful when selecting a model under a certain modality?
Thanks for your work! After reading the paper OmniMedVQA, I have two questions and sincerely look forward to the answers.
From the paper of MedVInT and RadFM, the dataset used in the radfm model is larger than that of medvint (16M vs. 1.64M). However, the performance of medvint is better than radfm in your paper. Do you further analyze the prediction results of the two models?
QA scores and prefix-based scores are distributed differently across image modalities. Which metric is more useful when selecting a model under a certain modality?