MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
https://mmmu-benchmark.github.io/
Apache License 2.0
327 stars 21 forks source link

There's an error in one of the Correct Examples for genetics #22

Closed eabase closed 4 months ago

eabase commented 4 months ago

Looking through the Correct Examples I came across the genetics example, and it is wrong.

2024-0427_173712_MMMU_—_Mozilla_Firefox

It would be interesting to know how you guys are wetting what is considered to be correct?

NipElement commented 4 months ago

Thank you for bringing this to our attention! The example you mentioned is not a good case, and we will remove it. However, it will still be counted as a correct answer based on our extraction logic.

eabase commented 4 months ago

@NipElement

However, it will still be counted as a correct answer based on our extraction logic.

Sorry, I don't understand.
Why would you "count" something that is wrong, as being correct?

drogozhang commented 4 months ago

Hi All, it is a parsing error in our answer extraction code part. It will introduce about 1% errors for some models.

To avoid such issues, people who are interested in evaluating their own models can extract the response themselves and submit the extracted answer directly, instead of the raw response.

In this case, such parsing errors will not be introduced.