chufanchen / read-paper-and-code

0 stars 0 forks source link

CoRR 2023 | Policy Optimization in RLHF: The Impact of Out-of-preference Data #58

Open chufanchen opened 7 months ago

chufanchen commented 7 months ago

https://arxiv.org/abs/2312.10584

https://github.com/liziniu/policy_optimization