RockeyCoss / SPO

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
https://arxiv.org/abs/2406.04314
137 stars 3 forks source link

Good job!! Can you provide the preference model training code? #16

Open moclimb opened 1 month ago

moclimb commented 1 month ago

I am so interested in it and wanna get the preference model training code just for testing.