Stability-AI / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
145 stars 47 forks source link

Automate registry registration for tasks #116

Closed polm-stability closed 11 months ago

polm-stability commented 11 months ago

Tasks are managed in a registry. Historically this registry is a manually updated giant dictionary. This PR adds code that automates adding functions to the registry, which reduces the potential for mistakes and removes redundant code.

Currently this only handles Japanese tasks. It's hard to automate this for all existing tasks because the relation of the task name (the dictionary key) to the class or file names is very inconsistent. While updating the code to add manual specifications wouldn't be difficult, there are so many tasks it would take a while, so for now older tasks have just been left as-is.