Clean up / enhance DPO code

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

Apache License 2.0

440 stars 52 forks source link

Closed natolambert closed 4 months ago

natolambert commented 8 months ago

Make it so you can run inference over individual text prompts (rather than chosen + rejected)
Clean up nograd/detach (see https://twitter.com/shxf0072/status/1771220126655811610), but should be pretty obvious
Add per-model log prob computation.

natolambert commented 4 months ago

Closing because stale. Feel free to reopen.