Single Path One-Shot NAS MXNet implementation with full training and searching pipeline. Support both Block and Channel Selection. Searched models better than the original paper are provided.
if epoch >= opt.epoch_start_cs:
opt.use_all_channels = False
Because of this, this supernet, which was trained from 0 - 70 and resumed from 70 to the end, was actually mainly trained with Block Selection. Only epochs between 60 - 70 are trained with Block Selection + Channel Selection. And during this period, the validation accuracy is found dropping to 1/1000. The same phenomenon was found in https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/issues/4#issuecomment-536694433
On the contrary, this Block Selection only supernet works well with the random/genetic search which was exploring Blocks as well as Channels (even though the channels were not randomly selected and trained as what the original paper claims). This may raise the possibility that randomly sampling (decoupling) these Channels may not be strongly related to a representative supernet.
Further experiments are required to make a more concrete conclusion.
https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/blob/e1928e5bbf071ce76ddf7d9774ca11d07d8ab269/train_imagenet.py#L533-L534
It should be:
Because of this, this supernet, which was trained from
0 - 70
and resumed from70 to the end
, was actually mainly trained with Block Selection. Only epochs between60 - 70
are trained with Block Selection + Channel Selection. And during this period, the validation accuracy is found dropping to1/1000
. The same phenomenon was found in https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/issues/4#issuecomment-536694433On the contrary, this Block Selection only supernet works well with the random/genetic search which was exploring Blocks as well as Channels (even though the channels were not randomly selected and trained as what the original paper claims). This may raise the possibility that randomly sampling (decoupling) these Channels may not be strongly related to a representative supernet.
Further experiments are required to make a more concrete conclusion.