Upstreaming SWDE, FDA, and Squad-completion to Eval Harness

HazyResearch / based

Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"

Apache License 2.0

206 stars 13 forks source link

Upstreaming SWDE, FDA, and Squad-completion to Eval Harness #14

Closed haileyschoelkopf closed 4 months ago

haileyschoelkopf commented 5 months ago

Hi!

Congrats on the really great work. I'll definitely be trying Based out and referencing your work here in future :)

Was really happy to see you found the Eval Harness useful! I wanted to see if you were interested in or needed any help upstreaming the custom evals you created to the main harness--it'd be great to have these more easily reproducible so future work can compare to the evaluations you report! I'd be happy to help on this front.

simran-arora commented 5 months ago

Hi thanks so much, it would be great to integrate the evaluations into the main harness. Please let us know how to proceed!

haileyschoelkopf commented 5 months ago

Assuming you didn't have to make any other internal changes to the library internals to make the tasks work, if you've got the bandwidth then opening a PR with the folders lm_eval/tasks/fda , lm_eval/tasks/swde, lm_eval/tasks/squad_completion should do the trick!

simran-arora commented 4 months ago

Hi Hailey, We have added the PR here: https://github.com/EleutherAI/lm-evaluation-harness/pull/1728 Please let us know if there is anything else.