CanyonWind / Single-Path-One-Shot-NAS-MXNet

Single Path One-Shot NAS MXNet implementation with full training and searching pipeline. Support both Block and Channel Selection. Searched models better than the original paper are provided.
151 stars 22 forks source link

Supernet Training with Constraints #16

Open betterhalfwzm opened 4 years ago

betterhalfwzm commented 4 years ago

Thanks for your excellent work! When i train supernet with constraints with follow script, i meet error in the val.

export MXNET_SAFE_ACCUMULATION=1

python train_imagenet.py \ --rec-train /data3/wangzhaoming/mxnet_imagenet/rec/train.rec --rec-train-idx /data3/wangzhaoming/mxnet_imagenet/rec/train.idx \ --rec-val /data3/wangzhaoming/mxnet_imagenet/rec/val.rec --rec-val-idx /data3/wangzhaoming/mxnet_imagenet/rec/val.idx \ --mode imperative --lr 1.3 --wd 0.00004 --lr-mode cosine --dtype float16\ --num-epochs 120 --batch-size 128 --num-gpus 8 -j 48 \ --label-smoothing --no-wd --warmup-epochs 5 --use-rec \ --model ShuffleNas \ --epoch-start-cs 60 --cs-warm-up --channels-layout OneShot \ --save-dir params_shufflenas_supernet --logging-file ./logs/shufflenas_supernet.log \ --train-upper-constraints flops-160-params-2.5 --train-bottom-constraints flops-90-params-1.4 \ --train-constraint-method evolution

Epoch[0] Batch [49] Speed: 322.095226 samples/sec accuracy=0.000605 lr=0.010393 Epoch[0] Batch [99] Speed: 492.513575 samples/sec accuracy=0.000791 lr=0.020787 Epoch[0] Batch [149] Speed: 457.981573 samples/sec accuracy=0.000937 lr=0.031180 Epoch[0] Batch [199] Speed: 688.650089 samples/sec accuracy=0.000903 lr=0.041573 Epoch[0] Batch [249] Speed: 465.918790 samples/sec accuracy=0.000957 lr=0.051967 Epoch[0] Batch [299] Speed: 490.846376 samples/sec accuracy=0.000957 lr=0.062360 Epoch[0] Batch [349] Speed: 606.910845 samples/sec accuracy=0.000977 lr=0.072753 Epoch[0] Batch [399] Speed: 567.445527 samples/sec accuracy=0.000986 lr=0.083147 Epoch[0] Batch [449] Speed: 618.184875 samples/sec accuracy=0.000990 lr=0.093540 Epoch[0] Batch [499] Speed: 593.677446 samples/sec accuracy=0.000982 lr=0.103933 Epoch[0] Batch [549] Speed: 631.991306 samples/sec accuracy=0.000978 lr=0.114327 Epoch[0] Batch [599] Speed: 614.757373 samples/sec accuracy=0.000985 lr=0.124720 Epoch[0] Batch [649] Speed: 568.749700 samples/sec accuracy=0.000975 lr=0.135114 Epoch[0] Batch [699] Speed: 610.768222 samples/sec accuracy=0.000961 lr=0.145507 Epoch[0] Batch [749] Speed: 659.102106 samples/sec accuracy=0.000961 lr=0.155900 Epoch[0] Batch [799] Speed: 563.044769 samples/sec accuracy=0.000964 lr=0.166294 Epoch[0] Batch [849] Speed: 572.482835 samples/sec accuracy=0.000959 lr=0.176687 Epoch[0] Batch [899] Speed: 611.510812 samples/sec accuracy=0.000969 lr=0.187080 Epoch[0] Batch [949] Speed: 585.310555 samples/sec accuracy=0.000970 lr=0.197474 Epoch[0] Batch [999] Speed: 586.269362 samples/sec accuracy=0.000970 lr=0.207867 Epoch[0] Batch [1049] Speed: 584.871140 samples/sec accuracy=0.000973 lr=0.218260 Epoch[0] Batch [1099] Speed: 580.345403 samples/sec accuracy=0.000976 lr=0.228654 Epoch[0] Batch [1149] Speed: 604.746532 samples/sec accuracy=0.000979 lr=0.239047 Epoch[0] Batch [1199] Speed: 425.625182 samples/sec accuracy=0.000976 lr=0.249440 Epoch[0] Batch [1249] Speed: 673.577257 samples/sec accuracy=0.000977 lr=0.259834 Traceback (most recent call last): File "train_imagenet.py", line 738, in main() File "train_imagenet.py", line 734, in main train(context) File "train_imagenet.py", line 710, in train err_top1_val, err_top5_val = test(ctx, val_data, epoch) File "train_imagenet.py", line 439, in test ignore_first_two_cs=opt.ignore_first_two_cs) File "/data3/wangzhaoming/Single-Path-One-Shot-NAS-MXNet/oneshot_nas_network.py", line 248, in random_channel_mask channel_choice = random.randint(channel_scale_start, len(self.candidate_scales) - 1) File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 222, in randint return self.randrange(a, b+1) File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 200, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (68,10, -58)