Open yiyexy opened 7 months ago
Hi, you can refer to #6
Hi, you can refer to #6
Thanks for replying!
However, you can see the number of samples; it is the same as the number of samples in the validation split. And I am sure I used the validation split to evaluate the model.
By the way, the dataset of science qa also have same issue.
Hi @yiyexy , there is one more comment below in #6 addressing the real problem of causing the inconsistency. It is because whether you use ocr or not.
For scienceqa, may I ask what is your score? I think our results are quite similar to the reported results. You can check the full list of our results here
@kcz358 Ok, I will try it later. I found that there is more than one point difference in the scienceqa scores. Why? I got 72.83 by using lmms-eval, but only got 70.90 with the llava script.
I used the lmms-eval repo to evaluate my model and got the following result:
So I used the evaluation script in llava and got the following result:
Why are they so different?