allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
440 stars 52 forks source link

Add models, refactor eval configs, fix beaver cost #78

Closed natolambert closed 8 months ago

natolambert commented 8 months ago

Closes #77, Closes #75, and other models from Twitter. Refactor eval configs to one file.

Before merging, fix the comments on the configs.

natolambert commented 8 months ago

Also closes #80 and closes #79 now :)