Closed teasgen closed 9 months ago
Same for me. By using the prompt "{question}\n{choice_txt}\nAnswer with the option's letter from the given choices directly." for LLaVA-v1.5-13b, I only get approximately 23% on whole validation split. Hope you can help me with what could be the problem.
@teasgen Thanks! See #9
Hi, thanks for your interest!
please refer #9 for the detailed prompt of Llava-1.5
Hi, thanks for your interest!
please refer #9 for the detailed prompt of Llava-1.5
Thanks!
Hi @teasgen , quick update:
we uploaded the inference code and sample results out of the code, achieving 35.8 on the val set.
Let me know if you have any other questions.
Hi @teasgen , quick update:
we uploaded the inference code and sample results out of the code, achieving 35.8 on the val set.
Let me know if you have any other questions.
Thank you again! But why this code achieves only 35.8, therefore in paper was reported about 36.4. Does it mean, that the dataset has huge variance? Do you have some intuition within what limits can the score change?
@teasgen Like I mentioned in the other repo, llava is not deterministic, so its outputs can be slightly different with different random seeds. Here I didn't try to find a perfect random seed that matches the performance from previous code.
Also, for other deterministic models like blip2, the results would be the same...
Hi, I'm trying to represent LLaVa-1.5-13b on your benchmark using prompts like:
for multiple-choice (e.g validation_Accounting_1)
Or for open questions:
(e.g validation_Architecture_and_Engineering_14) But I'm getting only 32.5% on validation split vs 36.4% (I tried only questions with single input image - 856/900). What could be the problem?
Originally posted by @teasgen in https://github.com/MMMU-Benchmark/MMMU/issues/5#issuecomment-1845122257