EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.03k stars 53 forks source link

Llava1.6 Mistral ScienceQA Performance #34

Open daniel-z-kaplan opened 3 months ago

daniel-z-kaplan commented 3 months ago

Hello,

If we view the chart provided, Mistral-7b achieves a score of .23/100 on ScienceQAFull. I am able to replicate this, but this is obviously very strange - the other comparison models get scores around 73.

kcz358 commented 3 months ago

Hi, @daniel-z-kaplan

Based on our logs, it is likely because the mistral-7b tends to generate an empty space and causing the exact match give a zero score. This is same for gqa and ai2d.

In our next release this will be fixed and new results will be updated.

cjfcsjt commented 2 months ago

Any updates? I met the same problem that the model liuhaotian/llava-v1.6-vicuna-7b always generates empty string.