allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
440 stars 52 forks source link

Bos fix #166

Closed natolambert closed 3 months ago

natolambert commented 3 months ago

Closes #164 I tested with a mistral RM, minor improvement in performance. https://huggingface.co/datasets/allenai/reward-bench-results/commit/3345ba1c222764729a6041eefecc35325ec4feb7