changlin31 / DNA

(CVPR 2020) Block-wisely Supervised Neural Architecture Search with Knowledge Distillation
234 stars 35 forks source link

关于学习率 #23

Closed nbl97 closed 3 years ago

nbl97 commented 3 years ago

我发现无论 --lr 设置是多少,训练是打印的 lr 在第一个epoch(按道理此时还没有衰减)永远是1e-4。逐行调试后发现在生成scheduler后,optimizer的lr就变味了1e-4,而我们设置的学习率变为了optimizer中initial_lr 参数的值。这是怎么回事?谢谢

changlin31 commented 3 years ago

In retraining, lr warms up for 3 epochs (in our setting it's 5) with 1e-4 by default. You can disable lr warmup by setting --warmup-epochs 0.

nbl97 commented 3 years ago

十分感谢您的回答!