What is the difference between arch and model? Why should we have them both?

VITA-Group / FasterSeg

[ICLR 2020] "FasterSeg: Searching for Faster Real-time Semantic Segmentation" by Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang

MIT License

525 stars 107 forks source link

What is the difference between arch and model? Why should we have them both? #27

Closed NoOneUST closed 4 years ago

NoOneUST commented 4 years ago

Another Question, when I try to use multi-process-evaluation, no matter I input device=[3,5,6,7] or [0,1,2,3] when CUDA_VISIBLE_DEVICES=3,5,6,7. The occupied GPU is always only 0. How should I solve it?

chenwydj commented 4 years ago

Hi @NoOneUST!

Thank you for your interest in our work!

The differentiable NAS formulate the search space as the model (e.g. convolution weights) and architecture (determines which operator to use in each cell). During the search, the model parameters and the architecture parameters are alternatively optimized. Thus, we need both of them.

To enable the multi-gpu training, you have to at least wrap the nn.Module into nn.DataParallel (which I did not implement here).