EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.46k stars 1.71k forks source link

Add Big-Bench Lite to eval-harness #345

Closed haileyschoelkopf closed 1 year ago

haileyschoelkopf commented 1 year ago

We want to be able to evaluate our models on BIG-Bench Lite. The BIGBench code is not the most outsider-friendly so I'll try to add the BigBench lite tasks to the eval-harness, and test on GPT2 for equivalence to confirm the scores should transfer back to the original implementation.

So far: have looked through the BigBench HF code a bit.

lintangsutawika commented 1 year ago

I remember testing T0 on Bigbench with Eval Harness last year. This probably could save us time and effort.

haileyschoelkopf commented 1 year ago

Oh that would be awesome if so. was this in mesh-tensorflow / t5x or in Huggingface?

haileyschoelkopf commented 1 year ago

https://colab.research.google.com/github/google/BIG-bench/blob/main/bigbench/bbseqio/docs/t5x_eval.ipynb This seems to be the script to use?