chenxin061 / pdarts

Codes for our paper "Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation"
Other
359 stars 83 forks source link

In arch search process , why use SGD for operation weight and Adam for arch_params ? #18

Closed JarveeLee closed 4 years ago

chenxin061 commented 4 years ago
  1. ADAM enables adaptive learning rate.
  2. We follow previous NAS work to use ADAM optimizer to tune arch_params.