OpenGPTX / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
8 stars 8 forks source link

implemented flores200 #96

Closed jjbuschhoff closed 10 months ago

jjbuschhoff commented 11 months ago

Implemented the flores200 translation benchmark, leaning on the implementation of wmt tasks.

jjbuschhoff commented 10 months ago

I'm in agreement with the changes here so far, and since the task is already used for evaluation I think we should merge.