Closed Forence1999 closed 7 months ago
Hi @Forence1999 , sorry for being late on this thread. May I know how much improvement you saw from this change, and on which model? Models these days can be a bit sensitive about the prompts. The colon was used in the original MMLU eval script. Our evaluation results of llama can match the llama paper, so we tend to keep this unless there is a notable difference.
Hi @Forence1999 , sorry for being late on this thread. May I know how much improvement you saw from this change, and on which model? Models these days can be a bit sensitive about the prompts. The colon was used in the original MMLU eval script. Our evaluation results of llama can match the llama paper, so we tend to keep this unless there is a notable difference.
Hi, Yizhong, Sorry for this late reply! I finetune LLaMA2-7B model on SuperNI dataset, and observed slightly over 1% improvement on MMLU. Intuitively,deleting the colon should be better, but following the regular practice should also be acceptable :)
Closing since it seems that the performance diff isn't massive, and to keep consistent with the original MMLU implementation.
replace
prompt += "The answer is:"
byprompt += "The answer is "
. The colon will cause slight decrease and fluctuation of performance, since the model has not been trained in such sentences.