MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
https://mmmu-benchmark.github.io/
Apache License 2.0
327 stars 21 forks source link

Why is every answer in Structural Engineering just "?" #7

Closed mckinziebrandon closed 9 months ago

mckinziebrandon commented 9 months ago

I was browsing the test set with the Dataset Viewer on HuggingFace (Link) and noticed that, for the Structural Engineering subset of Architecture_and_Engineering, literally every single answer and explanation is equal to "?". Surely this is a bug?

image

mckinziebrandon commented 9 months ago

Oh, never mind. I guess labels aren't provided for test, so users can't evaluate their own models.