problem in reproduce MMMU benchmark

Liuziyu77 / MIA-DPO

Official implement of MIA-DPO

Apache License 2.0

36 stars 1 forks source link

problem in reproduce MMMU benchmark #2

Open lyklly opened 4 hours ago

lyklly commented 4 hours ago

1731649158081 notice that llava 1.5 7b can reach 35.1% in MMMU in paper. However, i find the file "llava1.5_13b_val.json" official have provided can reach this accuracy, but llava 1.5 7b can only reach 25.4% . is there any problem here or did i make a mistake?

Liuziyu77 commented 4 hours ago

what's this file “llava1.5_13b_val.json”？

lyklly commented 4 hours ago

here, https://github.com/MMMU-Benchmark/MMMU/blob/main/mmmu/example_outputs/llava1.5_13b_val.json

lyklly commented 4 hours ago

but i can't get the accuracy showed in the paper whether I evaluate it myself or just use VlmEvalKit

Liuziyu77 commented 4 hours ago

Do you mean that when testing the LLaVa7B baseline, you were unable to reproduce the result of 35.1?

Liuziyu77 commented 4 hours ago

Maybe you can refer to the Leaderboard of VLMEvalKit：https://huggingface.co/spaces/opencompass/open_vlm_leaderboard