MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
https://mmmu-benchmark.github.io/
Apache License 2.0
327 stars 21 forks source link

Mismatch of the data label in Eval code #11

Closed XiongweiWu closed 9 months ago

XiongweiWu commented 9 months ago

Hi, I am checking your dataset, and the label of "validation_Literature_10" is different from the one in your eval code: https://github.com/MMMU-Benchmark/MMMU/blob/51ce7f3e829c16bb44bc5445782686b4c3508794/eval/answer_dict_val.json#L2206 where the GT is "D".

However In the dataset, this data point is: {'id': 'validation_Literature_10', 'question': 'Refer to the description <image 1>, which term refers to the choices a writer makes, including the combination of distinctive features such as mood, voice, or word choice?', 'options': "['Theme', 'Setting', 'Style', 'Tone']", 'explanation': '', 'image_1': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=340x291 at 0x7FFE9C906620>, 'image_2': None, 'image_3': None, 'image_4': None, 'image_5': None, 'image_6': None, 'image_7': None, 'img_type': "['Icons and Symbols']", 'answer': 'C', 'topic_difficulty': 'Easy', 'question_type': 'multiple-choice', 'subfield': 'Comparative Literature'} where the GT is "C", however.

Can u check it?

NipElement commented 9 months ago

The answer in the dataset is correct. Sorry we gave the wrong answer in the example, now we have fixed it! Thank you for pointing this out.