allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
281 stars 28 forks source link

DPO ref free sweep prep #96

Closed natolambert closed 3 months ago

natolambert commented 3 months ago

Closes #1 as long as sweep works :)

natolambert commented 3 months ago

Results: image