Open giuliolovisotto opened 2 months ago
Hi! This would be great! We should be able to use the same nomenclature as they use here, maybe prepending it with openai
?
This script to generate the task boilerplate should come in handy, and let me know if I can help!
From the OpenAI o1 System Card:
The datasets are included in their library here -> https://github.com/openai/simple-evals .
Is anyone working on this? I'd be interested in adding these to lm-evaluation-harness. What's a good way to structure this new task in terms of co-existing with the already present
mmlu
versions (kmllu, cmmlu, arabicmmlu, ...
)Tagging @baberabb 😄