allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
375 stars 47 forks source link

Add docker image and script for submitting eval jobs #18

Closed jacob-morrison closed 7 months ago

jacob-morrison commented 7 months ago

This PR adds support for beaker batch eval jobs. To do this, this includes: