Experiment request: DPO with different betas

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

Apache License 2.0

442 stars 52 forks source link

Closed natolambert closed 7 months ago

natolambert commented 8 months ago

TLDR: How much does beta impact a DPO model accuracy by controlling KL distance?

natolambert commented 7 months ago

Closing. No longer logging experiments here.