New Task: `openai_mmmlu` professionaly translated by OpenAI as part of o1 release

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

MIT License

7.05k stars 1.89k forks source link

"we translated MMLU’s[39] test set into 14 languages using professional human translators. This approach differs from the GPT-4 Paper where MMLU was machine translated with Azure Translate [14]. Relying on human translators for this evaluation increases confidence in the accuracy of the translations"

The datasets are included in their library here -> https://github.com/openai/simple-evals .

Is anyone working on this? I'd be interested in adding these to lm-evaluation-harness. What's a good way to structure this new task in terms of co-existing with the already present mmlu versions (kmllu, cmmlu, arabicmmlu, ...)

Tagging @baberabb 😄

EleutherAI / lm-evaluation-harness

New Task: `openai_mmmlu` professionaly translated by OpenAI as part of o1 release #2305