Unreasonable data in AI2D dataset used for evaluation

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

https://lmms-lab.github.io/

Other

1.02k stars 52 forks source link

Unreasonable data in AI2D dataset used for evaluation #103

Open yqy2001 opened 2 weeks ago

yqy2001 commented 2 weeks ago

Hello! I examined the AI2D dataset used for evaluation and found that a portion of them are unreasonable, suggesting errors in the replacement of options (A, B, ...).

Could you fix this and share your replacement strategy?

Thank you.

Examples:

img_v3_02bj_e9ad2a78-4a85-4f04-8c9b-0a8315f417fg img_v3_02bj_84214837-58ac-49c2-8179-866b715c22bg img_v3_02bj_0a102be3-d612-46c1-ac38-7e0af91dadag img_v3_02bj_18323e14-892b-4c5e-96ef-fb2f309a6ffg img_v3_02bj_411b5d9a-3ae7-4d40-9005-34d5e41c7f4g

kcz358 commented 2 weeks ago

Hi, I remember that there is an image replacement issue for ai2d during our development raised by the community. I think at the end we follow the pipeline of the ai2d and perform text replacement.

Luodian commented 1 week ago

I took into deep look and got confused on AI2D's data.

Here's a check from another evaluation suite and widely used in the development of InternVL-1.5.

https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat#ai2d-test

I downloaded their AI2D_TEST data and found the image is the same as ours.

yqy2001 commented 1 week ago

confusing +1