RockeyCoss / SPO

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
https://arxiv.org/abs/2406.04314
137 stars 3 forks source link