bigscience-workshop / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
101 stars 30 forks source link

Corrected bug in Flores fewshot tasks and added flores task #145

Closed rbawden closed 1 year ago

rbawden commented 2 years ago

Corrected bug: fewshot examples were using the same source and target sentence

Added task: wmt_hi2en - take fewshot examples from wmt hi-en dev set

jon-tow commented 2 years ago

@rbawden Good catch! This LGTM - I've just formatted the files and removed debugging statements. Just want to confirm there's nothing else you'd like to add; then we can merge this.