Reproduce SEED LLaMA evaluation

Thanks for your great work.

I have a question related to SEED-LLAMA evaluation settings. I tried to reproduce the VQA accuracy of instruction tuned SEED-LLaMA 8B on VQAv2 dataset but i cannot reproduce results in paper (66.2).

I tried on 8x A100 80GB gpu and 1 batch size. This is the generation config i used.

generation_config = {
        'temperature': 1.0,
        'num_beams': 1,
        'max_new_tokens': 64,
        'top_p': 0.5,
        'do_sample': True
    }

And this is the result calculated by official evaluation website. "test-dev": {"yes/no": 38.59, "number": 23.68, "other": 39.1, "overall": 37.14}

It would be thankful if you can provide your evaluation settings or some advice.

AILab-CVC / SEED

Reproduce SEED LLaMA evaluation #33