I have a question related to SEED-LLAMA evaluation settings.
I tried to reproduce the VQA accuracy of instruction tuned SEED-LLaMA 8B on VQAv2 dataset but i cannot reproduce results in paper (66.2).
I tried on 8x A100 80GB gpu and 1 batch size.
This is the generation config i used.
Thanks for your great work.
I have a question related to SEED-LLAMA evaluation settings. I tried to reproduce the VQA accuracy of instruction tuned SEED-LLaMA 8B on VQAv2 dataset but i cannot reproduce results in paper (66.2).
I tried on 8x A100 80GB gpu and 1 batch size. This is the generation config i used.
And this is the result calculated by official evaluation website. "test-dev": {"yes/no": 38.59, "number": 23.68, "other": 39.1, "overall": 37.14}
It would be thankful if you can provide your evaluation settings or some advice.