Open clairez-cerebras opened 3 months ago
Thank you for reporting the issue. I will try to look into this error later.
I I encountered the same problem when reproduce llava-1.6-mistral-7b results in ScienceQA. I found the reason maybe the following lines in models/llava.py
.
Although the annotation says “The above for loop has bug” when input has no visuals, but actualy, the above loop run normally and add a prompt_question
to the question_input
list, and then these line add a prompt_question
again. As the result, these no visual inputs generate 2 answers, leading to order mismatch of questions and answers.
After remove these line code, the scienceqa-full result changes from 36.3 to 76.8.
Hi @GoGoJoestar , I think your fix is correct. We previously use flattened visuals instead of batched visuals in the previous loop, resulting error when handling none visuals. I will remove these lines
I was attempting to reproduce llava-1.5's results in ScienceQA but was not able to reproduce the results reported. Command:
Config:
The results I got:
which is far from what's reported in the paper, for example, SQA-IMG is reported to have 71.6 in the llava-1.5 paper and SQA in general is reported to be around 70.4 in the excel sheet What could be wrong?