replace `:` in eval/mmlu/run_eval.py

Forence1999 commented 10 months ago

replace prompt += "The answer is:" by prompt += "The answer is ". The colon will cause slight decrease and fluctuation of performance, since the model has not been trained in such sentences.

yizhongw commented 8 months ago

Hi @Forence1999 , sorry for being late on this thread. May I know how much improvement you saw from this change, and on which model? Models these days can be a bit sensitive about the prompts. The colon was used in the original MMLU eval script. Our evaluation results of llama can match the llama paper, so we tend to keep this unless there is a notable difference.

Forence1999 commented 7 months ago

Hi @Forence1999 , sorry for being late on this thread. May I know how much improvement you saw from this change, and on which model? Models these days can be a bit sensitive about the prompts. The colon was used in the original MMLU eval script. Our evaluation results of llama can match the llama paper, so we tend to keep this unless there is a notable difference.

Hi, Yizhong, Sorry for this late reply! I finetune LLaMA2-7B model on SuperNI dataset, and observed slightly over 1% improvement on MMLU. Intuitively,deleting the colon should be better, but following the regular practice should also be acceptable :)

hamishivi commented 7 months ago

Closing since it seems that the performance diff isn't massive, and to keep consistent with the original MMLU implementation.

allenai / open-instruct

replace `:` in eval/mmlu/run_eval.py #85