Closed sorobedio closed 1 month ago
Our leaderboard is designed for evaluating single models. Manually selecting the best-performing model from a set of pre-trained models would be considered unfair. However, I think that approach is acceptable if you use techniques similar to Mixture of Experts (MoE) to automatically derive better results.
Hello, I have a set of pretrained models, and I plan to evaluate them on the MMLU-Pro benchmark without any additional training loccaly, selecting the best-performing model for submission. Is this approach valid, or could it be considered cheating?