allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.67k stars 480 forks source link

Which mmlu validation setting is recommend? #714

Open mathfinder opened 2 months ago

mathfinder commented 2 months ago

❓ The question

I found that you provide many mmlu test methods. Take mmlu_stem as an example, including mmlu_stem_test, mmlu_stem, mmlu_stem_var, mmlu_stem_mc_5shot, mmlu_humanities_mc_5shot, mmlu_humanities_mc_5shot_test. Which one is more recommended?

aman-17 commented 1 month ago

For initial testing, it’s recommended to start with easier tasks like the 5-shot methods (e.g., mmlu_stem_mc_5shot or mmlu_humanities_mc_5shot_test). These are useful for evaluating the model’s ability to generalize with a few examples. However, for less capable models, it is not recommended to rely on multiple-choice (MC) tasks right away, as they may not perform well. The focus should be on simpler tasks to gauge the model’s baseline performance before moving to more complex evaluations like MC.