Minor run_rm.py fixes - Githubissues

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

Apache License 2.0

440 stars 52 forks source link

Closed PavelCz closed 7 months ago

PavelCz commented 7 months ago

Looking forward to using this project!

Here are some minor changes I made to be able to run run_rm.py locally:

Disable saving to hub for second set of scores, when the command line parameter is set.
Added an argument to disable saving the metrics.json file for Beaker. By default the file gets saved to /output/metric.json which won't be writable on most machines, causing the script to fail.

natolambert commented 7 months ago

Thx @PavelCz -- looks great. Any chance you can add to run_dpo.py too? Trying to keep those scripts in sync.

PavelCz commented 7 months ago

@natolambert, I added the same changes to run_dpo.py. I can run that file locally as well now.

natolambert commented 7 months ago

LGTM. Should be able to merge once the workflows run. I'll probably handle it! Thanks!