allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
281 stars 28 forks source link

Saving bug (non breaking) #89

Closed natolambert closed 3 months ago

natolambert commented 3 months ago

We don't use our own sub_path correctly in saving results. It works, but is confusing. See:

  1. https://github.com/allenai/reward-bench/blob/main/scripts/run_rm.py#L293
  2. https://github.com/allenai/reward-bench/blob/main/rewardbench/utils.py#L50

Tbh i'm surprised the code still works as expected (lol)