chenxin061 / pdarts

Codes for our paper "Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation"
Other
360 stars 83 forks source link

In arch search process , why use SGD for operation weight and Adam for arch_params ? #18

Closed JarveeLee closed 5 years ago

chenxin061 commented 5 years ago
  1. ADAM enables adaptive learning rate.
  2. We follow previous NAS work to use ADAM optimizer to tune arch_params.