Sakshi113 / MMAU

20 stars 1 forks source link

realase eval model prompt #3

Open UltraEval opened 6 days ago

UltraEval commented 6 days ago

The paper declare best result of model with a prompt. Can release them? cannot reproduce experimental results image

Sreyan88 commented 6 days ago

Hi @UltraEval ,

Which model are trying to reproduce the results on?

Sreyan88 commented 6 days ago

Hi @UltraEval ,

Here is the prompt that gave us best performance most of the times:

text_content = f"""{question} Select one option from the provided choices.\n{choices}.

This allows the model to get better performance on our string-matching eval metric. Please let us know if you have any more doubts.

UltraEval commented 1 day ago

Thanks for your reply!

Hi @UltraEval ,

Which model are trying to reproduce the results on?

with the prompt

text_content = f"""{question} Select one option from the provided choices.\n{choices}.

The performance of Qwen-Audio-Chat, when we teste on Test-mini, is 36.9% lower than the declared 43.1%. Conversely, Qwen2-Audio-Instruct demonstrates a performance that is 54.0% higher than the declared 49.20%😂😂.

here is detail: qwen_chat_mmau.json qwen2_mmau.json