MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
https://mmmu-benchmark.github.io/
Apache License 2.0
327 stars 21 forks source link

model evaluation #3

Closed mactavish91 closed 9 months ago

mactavish91 commented 9 months ago

Thank you for your great evaluation, we have recently used a training strategy similar to LLava, which co-trains vqa and chat data, resulting in significant improvements. Could you re evaluate our model? https://github.com/THUDM/CogVLM/

xiangyue9607 commented 9 months ago

Thanks! We are happy to reevaluate your model again. On the other hand, it would be great if you could run the evaluation yourself and provide your validation and test set predictions. Then we can update the leaderboard with a fair and accurate score.

Thanks, Xiang

Outlook for iOShttps://aka.ms/o0ukef


From: mactavish91 @.> Sent: Friday, December 1, 2023 12:55:58 AM To: MMMU-Benchmark/MMMU @.> Cc: Subscribed @.***> Subject: [MMMU-Benchmark/MMMU] model evaluation (Issue #3)

Thank you for your great evaluation, we have recently used a training strategy similar to LLava, which co-trains vqa and chat data, resulting in significant improvements. Can you re evaluate our model? https: //github. com/THUDM/CogVLM/ —Reply

Thank you for your great evaluation, we have recently used a training strategy similar to LLava, which co-trains vqa and chat data, resulting in significant improvements. Can you re evaluate our model? https://github.com/THUDM/CogVLM/https://urldefense.com/v3/__https://github.com/THUDM/CogVLM/__;!!KGKeukY!xypCutw-8lR8kSlup8D3wycxuEm_evM_fM_k-AvFwq5CExRgsLZK4oqAYtFs5BOfky-z4F4oQlDyZ_0L0kfMcYOjUw$

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/MMMU-Benchmark/MMMU/issues/3__;!!KGKeukY!xypCutw-8lR8kSlup8D3wycxuEm_evM_fM_k-AvFwq5CExRgsLZK4oqAYtFs5BOfky-z4F4oQlDyZ_0L0kc0vkyDSw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADRC4H5VOAXALW5ME3LBVZDYHFWO5AVCNFSM6AAAAABACKBCKOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDAMJVGIYDQMA__;!!KGKeukY!xypCutw-8lR8kSlup8D3wycxuEm_evM_fM_k-AvFwq5CExRgsLZK4oqAYVFs5BOf0eR5hwrBvGGAdLJgQchEgC7Mfg$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

xiangyue9607 commented 9 months ago

You can submit your test set evaluation at eval.ai: https://eval.ai/web/challenges/challenge-page/2179. We will update your results in our paper based on your submission.