Open daniel-z-kaplan opened 3 months ago
Hi, @daniel-z-kaplan
Based on our logs, it is likely because the mistral-7b tends to generate an empty space and causing the exact match give a zero score. This is same for gqa and ai2d.
In our next release this will be fixed and new results will be updated.
Any updates? I met the same problem that the model liuhaotian/llava-v1.6-vicuna-7b
always generates empty string.
Hello,
If we view the chart provided, Mistral-7b achieves a score of .23/100 on ScienceQAFull. I am able to replicate this, but this is obviously very strange - the other comparison models get scores around 73.