SalesforceAIResearch / DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"
https://arxiv.org/abs/2311.12908
Apache License 2.0
220 stars 22 forks source link

The learning rate in the paper and the code are inconsistent #12

Open kjzju opened 3 months ago

kjzju commented 3 months ago

image In your paper, you use the learning rate above where β = 5000, which meaning the learning rate is less than 10−8. However, in your code, there is image Thus, the learning rate in your code is about 2e-5 with the parameter "scale_lr". Is there anything wrong?