关于学习率 - Githubissues

changlin31 / DNA

(CVPR 2020) Block-wisely Supervised Neural Architecture Search with Knowledge Distillation

234 stars 35 forks source link

Closed nbl97 closed 3 years ago

nbl97 commented 3 years ago

我发现无论 --lr 设置是多少，训练是打印的 lr 在第一个epoch（按道理此时还没有衰减）永远是1e-4。逐行调试后发现在生成scheduler后，optimizer的lr就变味了1e-4，而我们设置的学习率变为了optimizer中initial_lr 参数的值。这是怎么回事？谢谢

changlin31 commented 3 years ago

In retraining, lr warms up for 3 epochs (in our setting it's 5) with 1e-4 by default. You can disable lr warmup by setting --warmup-epochs 0.

nbl97 commented 3 years ago

十分感谢您的回答！