allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
301 stars 33 forks source link

Add Claude 3.5 Sonnet #153

Closed natolambert closed 2 weeks ago

natolambert commented 2 weeks ago

Started adding Gemma 2 here, but turns out the vllm is kind of broken so was rough.