Good job!! Can you provide the preference model training code?

RockeyCoss / SPO

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

https://arxiv.org/abs/2406.04314

137 stars 3 forks source link

Open moclimb opened 1 month ago

moclimb commented 1 month ago

I am so interested in it and wanna get the preference model training code just for testing.