issues
search
allenai
/
reward-bench
RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
440
stars
52
forks
source link
Clean up / enhance DPO code
#82
Closed
natolambert
closed
4 months ago
natolambert
commented
8 months ago
Make it so you can run inference over individual text prompts (rather than chosen + rejected)
Clean up nograd/detach (see
https://twitter.com/shxf0072/status/1771220126655811610
), but should be pretty obvious
Add per-model log prob computation.
natolambert
commented
4 months ago
Closing because stale. Feel free to reopen.