Psycoy / MixEval

The official evaluation suite and dynamic data release for MixEval.
https://mixeval.github.io/
196 stars 28 forks source link

Default SYSTEM_MESSAGE for Llama 3 Instruct is "You are a pirate chatbot who always responds in pirate speak!" #4

Closed lhl closed 2 months ago

lhl commented 3 months ago

It seems like this seems would generate bad benchmark results?

https://github.com/Psycoy/MixEval/blob/main/mix_eval/models/llama_3_8b_instruct.py#L18C1-L18C160

        self.SYSTEM_MESSAGE = {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"} # set to None if no system message

Note this seems to be an issue for llama_3_70b_instruct.py and zephyr_7b_beta.py

Is this what was run to generate the scores on https://mixeval.github.io/#leaderboard ?!?!

Psycoy commented 3 months ago

No, it's just for testing. We tested both system prompts. (the one shown on the Llama 3 hf repo or set to None)

jmercat commented 1 week ago

Could you set the default to None on the main ?

Psycoy commented 1 week ago

@jmercat Done

jmercat commented 3 days ago

@Psycoy It appears from your answer in #35 That the prompt used for the leaderboard results is actually the pirate talk one. I asked that no prompt should be the default on main thinking that it was the one to use to reproduce leaderboard results but since it is not, maybe main should be reverted back to what it was (see code line).