The learning rate in the paper and the code are inconsistent

SalesforceAIResearch / DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

https://arxiv.org/abs/2311.12908

Apache License 2.0

272 stars 24 forks source link

The learning rate in the paper and the code are inconsistent #12

Open kjzju opened 5 months ago

kjzju commented 5 months ago

In your paper, you use the learning rate above where β = 5000, which meaning the learning rate is less than 10−8. However, in your code, there is Thus, the learning rate in your code is about 2e-5 with the parameter "scale_lr". Is there anything wrong?