allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
281 stars 28 forks source link

Cleanup of auxiliary scripts #59

Closed ljvmiranda921 closed 3 months ago

ljvmiranda921 commented 4 months ago

Kinda big PR that cleans up all the auxiliary scripts and updates the points to the necessary HF repositories. It also tries to make some of the charts and visualizations a little bit prettier.

ljvmiranda921 commented 3 months ago

I'll probably merge this now since it's accumulating a lot of code. I'll open up another PR just in case we need more stuff.