different result between lmms-eval and llava-eval on text-vqa

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

https://lmms-lab.github.io/

Other

1.9k stars 145 forks source link

different result between lmms-eval and llava-eval on text-vqa #51

Open yiyexy opened 7 months ago

yiyexy commented 7 months ago

I used the lmms-eval repo to evaluate my model and got the following result:

So I used the evaluation script in llava and got the following result:

Why are they so different?

kcz358 commented 7 months ago

Hi, you can refer to #6

yiyexy commented 7 months ago

Hi, you can refer to #6

Thanks for replying!

However, you can see the number of samples; it is the same as the number of samples in the validation split. And I am sure I used the validation split to evaluate the model.

By the way, the dataset of science qa also have same issue.

kcz358 commented 7 months ago

Hi @yiyexy , there is one more comment below in #6 addressing the real problem of causing the inconsistency. It is because whether you use ocr or not.

For scienceqa, may I ask what is your score? I think our results are quite similar to the reported results. You can check the full list of our results here

yiyexy commented 7 months ago

@kcz358 Ok, I will try it later. I found that there is more than one point difference in the scienceqa scores. Why? I got 72.83 by using lmms-eval, but only got 70.90 with the llava script.