allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
375 stars 47 forks source link

Output leaderboard scores when running `run_rm.py` #91

Closed natolambert closed 5 months ago

natolambert commented 6 months ago

e.g. integrate this https://github.com/allenai/reward-bench?tab=readme-ov-file#getting-leaderboard-section-scores