allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
442 stars 52 forks source link

Check beaver cost model #75

Closed natolambert closed 8 months ago

natolambert commented 8 months ago

Quoting an author (I think):

Great work, this is an long due effort in this field. Though it's a bit unexpected to see beaver-cost model performed poorly on safety-related dataset. Have you checked if you have got the signs worked out? Because in our setting negative reward means safer and should be chosen.

mickel-liu commented 8 months ago

Thanks for the quick fix!! (cc: @XuehaiPan)