issues
search
allenai
/
reward-bench
RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
375
stars
47
forks
source link
Clean up / enhance DPO code
#82
Closed
natolambert
closed
3 months ago
natolambert
commented
6 months ago
Make it so you can run inference over individual text prompts (rather than chosen + rejected)
Clean up nograd/detach (see
https://twitter.com/shxf0072/status/1771220126655811610
), but should be pretty obvious
Add per-model log prob computation.
natolambert
commented
3 months ago
Closing because stale. Feel free to reopen.