Closed natolambert closed 7 months ago
A simple fix should be to change the following line in run_rm.py
scores_chosen.extend(None * len(results_sub))
scores_rejected.extend(None * len(results_sub))
to
scores_chosen.extend([0] * len(results_sub))
scores_rejected.extend([0] * len(results_sub))
Has been fixed in #31 (I think)
The logic for saving per-prompt scores is breaking these two models, see the following beaker logs. https://beaker.org/ex/01HQ6VG7H3XPYRVP3S76ZXB126 https://beaker.org/ex/01HQ6VG7GMRN9TNWNZH2WTMKG0