reproducibility for mme evaluation

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Apache License 2.0

2.47k stars 153 forks source link

With 4-bit quantization model in huggingface, I could not reproduce the mme performance in my environment.

What is the most important thing to product the performance?

image padding (I used simply resized image to 490 490 but this paper used padding to left, right, top, bottom)
prompt (I used answer the question using a single word or a phrase but this paper used answer the question briefly)
generation hyperparameter (i used just greedy search (num_beams=1, temperature=1, do_sample=False), but this paper used beam search including num_beams=5, temperature=1.0)

InternLM / InternLM-XComposer