TIGER-AI-Lab / MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Apache License 2.0
122 stars 21 forks source link

regarding leaderboard submission #26

Closed sorobedio closed 1 month ago

sorobedio commented 1 month ago

Hello, I have a set of pretrained models, and I plan to evaluate them on the MMLU-Pro benchmark without any additional training loccaly, selecting the best-performing model for submission. Is this approach valid, or could it be considered cheating?

Wyyyb commented 1 month ago

Our leaderboard is designed for evaluating single models. Manually selecting the best-performing model from a set of pre-trained models would be considered unfair. However, I think that approach is acceptable if you use techniques similar to Mixture of Experts (MoE) to automatically derive better results.