The network weights and the architecture weights(coefficient) train together?

MenghaoGuo / AutoDeeplab

Pytorch Implementation the paper Auto-DeepLab Hierarchical Neural Architecture Search for Semantic Image Segmentation

https://arxiv.org/abs/1901.02985

410 stars 97 forks source link

The network weights and the architecture weights(coefficient) train together? #51

Open Linfengscat opened 5 years ago

Linfengscat commented 5 years ago

I think it would be better if we train the network weights and the architecture weights separately, to be exact , frozen the grad of α，β when updating w, also frozen the gradient of w when updating α，β.

By the definition of: 微信图片_20190719180918

HankKung commented 5 years ago

I believe that the code works this way already. The optimizer of the model only contains wight parameters and the optimizer in architecture does alpha and beta only. Please correct me if it isn't right.

Linfengscat commented 5 years ago

@HankKung Sorry I was careless, Thanks