Fix score saving PairRM and SteamSHP

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

Apache License 2.0

375 stars 47 forks source link

Closed natolambert closed 7 months ago

natolambert commented 7 months ago

The logic for saving per-prompt scores is breaking these two models, see the following beaker logs. https://beaker.org/ex/01HQ6VG7H3XPYRVP3S76ZXB126 https://beaker.org/ex/01HQ6VG7GMRN9TNWNZH2WTMKG0

natolambert commented 7 months ago

A simple fix should be to change the following line in run_rm.py

                scores_chosen.extend(None * len(results_sub))
                scores_rejected.extend(None * len(results_sub))

                scores_chosen.extend([0] * len(results_sub))
                scores_rejected.extend([0] * len(results_sub))

natolambert commented 7 months ago

Has been fixed in #31 (I think)