Closed haileyschoelkopf closed 4 months ago
Hi thanks so much, it would be great to integrate the evaluations into the main harness. Please let us know how to proceed!
Assuming you didn't have to make any other internal changes to the library internals to make the tasks work, if you've got the bandwidth then opening a PR with the folders lm_eval/tasks/fda
, lm_eval/tasks/swde
, lm_eval/tasks/squad_completion
should do the trick!
Hi Hailey, We have added the PR here: https://github.com/EleutherAI/lm-evaluation-harness/pull/1728 Please let us know if there is anything else.
Hi!
Congrats on the really great work. I'll definitely be trying Based out and referencing your work here in future :)
Was really happy to see you found the Eval Harness useful! I wanted to see if you were interested in or needed any help upstreaming the custom evals you created to the main harness--it'd be great to have these more easily reproducible so future work can compare to the evaluations you report! I'd be happy to help on this front.