Open betterhalfwzm opened 4 years ago
Thanks for your excellent work! When i train supernet with constraints with follow script, i meet error in the val.
export MXNET_SAFE_ACCUMULATION=1
python train_imagenet.py \ --rec-train /data3/wangzhaoming/mxnet_imagenet/rec/train.rec --rec-train-idx /data3/wangzhaoming/mxnet_imagenet/rec/train.idx \ --rec-val /data3/wangzhaoming/mxnet_imagenet/rec/val.rec --rec-val-idx /data3/wangzhaoming/mxnet_imagenet/rec/val.idx \ --mode imperative --lr 1.3 --wd 0.00004 --lr-mode cosine --dtype float16\ --num-epochs 120 --batch-size 128 --num-gpus 8 -j 48 \ --label-smoothing --no-wd --warmup-epochs 5 --use-rec \ --model ShuffleNas \ --epoch-start-cs 60 --cs-warm-up --channels-layout OneShot \ --save-dir params_shufflenas_supernet --logging-file ./logs/shufflenas_supernet.log \ --train-upper-constraints flops-160-params-2.5 --train-bottom-constraints flops-90-params-1.4 \ --train-constraint-method evolution
Epoch[0] Batch [49] Speed: 322.095226 samples/sec accuracy=0.000605 lr=0.010393 Epoch[0] Batch [99] Speed: 492.513575 samples/sec accuracy=0.000791 lr=0.020787 Epoch[0] Batch [149] Speed: 457.981573 samples/sec accuracy=0.000937 lr=0.031180 Epoch[0] Batch [199] Speed: 688.650089 samples/sec accuracy=0.000903 lr=0.041573 Epoch[0] Batch [249] Speed: 465.918790 samples/sec accuracy=0.000957 lr=0.051967 Epoch[0] Batch [299] Speed: 490.846376 samples/sec accuracy=0.000957 lr=0.062360 Epoch[0] Batch [349] Speed: 606.910845 samples/sec accuracy=0.000977 lr=0.072753 Epoch[0] Batch [399] Speed: 567.445527 samples/sec accuracy=0.000986 lr=0.083147 Epoch[0] Batch [449] Speed: 618.184875 samples/sec accuracy=0.000990 lr=0.093540 Epoch[0] Batch [499] Speed: 593.677446 samples/sec accuracy=0.000982 lr=0.103933 Epoch[0] Batch [549] Speed: 631.991306 samples/sec accuracy=0.000978 lr=0.114327 Epoch[0] Batch [599] Speed: 614.757373 samples/sec accuracy=0.000985 lr=0.124720 Epoch[0] Batch [649] Speed: 568.749700 samples/sec accuracy=0.000975 lr=0.135114 Epoch[0] Batch [699] Speed: 610.768222 samples/sec accuracy=0.000961 lr=0.145507 Epoch[0] Batch [749] Speed: 659.102106 samples/sec accuracy=0.000961 lr=0.155900 Epoch[0] Batch [799] Speed: 563.044769 samples/sec accuracy=0.000964 lr=0.166294 Epoch[0] Batch [849] Speed: 572.482835 samples/sec accuracy=0.000959 lr=0.176687 Epoch[0] Batch [899] Speed: 611.510812 samples/sec accuracy=0.000969 lr=0.187080 Epoch[0] Batch [949] Speed: 585.310555 samples/sec accuracy=0.000970 lr=0.197474 Epoch[0] Batch [999] Speed: 586.269362 samples/sec accuracy=0.000970 lr=0.207867 Epoch[0] Batch [1049] Speed: 584.871140 samples/sec accuracy=0.000973 lr=0.218260 Epoch[0] Batch [1099] Speed: 580.345403 samples/sec accuracy=0.000976 lr=0.228654 Epoch[0] Batch [1149] Speed: 604.746532 samples/sec accuracy=0.000979 lr=0.239047 Epoch[0] Batch [1199] Speed: 425.625182 samples/sec accuracy=0.000976 lr=0.249440 Epoch[0] Batch [1249] Speed: 673.577257 samples/sec accuracy=0.000977 lr=0.259834 Traceback (most recent call last): File "train_imagenet.py", line 738, in main() File "train_imagenet.py", line 734, in main train(context) File "train_imagenet.py", line 710, in train err_top1_val, err_top5_val = test(ctx, val_data, epoch) File "train_imagenet.py", line 439, in test ignore_first_two_cs=opt.ignore_first_two_cs) File "/data3/wangzhaoming/Single-Path-One-Shot-NAS-MXNet/oneshot_nas_network.py", line 248, in random_channel_mask channel_choice = random.randint(channel_scale_start, len(self.candidate_scales) - 1) File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 222, in randint return self.randrange(a, b+1) File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 200, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (68,10, -58)
Thanks for your excellent work! When i train supernet with constraints with follow script, i meet error in the val.
export MXNET_SAFE_ACCUMULATION=1
python train_imagenet.py \ --rec-train /data3/wangzhaoming/mxnet_imagenet/rec/train.rec --rec-train-idx /data3/wangzhaoming/mxnet_imagenet/rec/train.idx \ --rec-val /data3/wangzhaoming/mxnet_imagenet/rec/val.rec --rec-val-idx /data3/wangzhaoming/mxnet_imagenet/rec/val.idx \ --mode imperative --lr 1.3 --wd 0.00004 --lr-mode cosine --dtype float16\ --num-epochs 120 --batch-size 128 --num-gpus 8 -j 48 \ --label-smoothing --no-wd --warmup-epochs 5 --use-rec \ --model ShuffleNas \ --epoch-start-cs 60 --cs-warm-up --channels-layout OneShot \ --save-dir params_shufflenas_supernet --logging-file ./logs/shufflenas_supernet.log \ --train-upper-constraints flops-160-params-2.5 --train-bottom-constraints flops-90-params-1.4 \ --train-constraint-method evolution
Epoch[0] Batch [49] Speed: 322.095226 samples/sec accuracy=0.000605 lr=0.010393 Epoch[0] Batch [99] Speed: 492.513575 samples/sec accuracy=0.000791 lr=0.020787 Epoch[0] Batch [149] Speed: 457.981573 samples/sec accuracy=0.000937 lr=0.031180 Epoch[0] Batch [199] Speed: 688.650089 samples/sec accuracy=0.000903 lr=0.041573 Epoch[0] Batch [249] Speed: 465.918790 samples/sec accuracy=0.000957 lr=0.051967 Epoch[0] Batch [299] Speed: 490.846376 samples/sec accuracy=0.000957 lr=0.062360 Epoch[0] Batch [349] Speed: 606.910845 samples/sec accuracy=0.000977 lr=0.072753 Epoch[0] Batch [399] Speed: 567.445527 samples/sec accuracy=0.000986 lr=0.083147 Epoch[0] Batch [449] Speed: 618.184875 samples/sec accuracy=0.000990 lr=0.093540 Epoch[0] Batch [499] Speed: 593.677446 samples/sec accuracy=0.000982 lr=0.103933 Epoch[0] Batch [549] Speed: 631.991306 samples/sec accuracy=0.000978 lr=0.114327 Epoch[0] Batch [599] Speed: 614.757373 samples/sec accuracy=0.000985 lr=0.124720 Epoch[0] Batch [649] Speed: 568.749700 samples/sec accuracy=0.000975 lr=0.135114 Epoch[0] Batch [699] Speed: 610.768222 samples/sec accuracy=0.000961 lr=0.145507 Epoch[0] Batch [749] Speed: 659.102106 samples/sec accuracy=0.000961 lr=0.155900 Epoch[0] Batch [799] Speed: 563.044769 samples/sec accuracy=0.000964 lr=0.166294 Epoch[0] Batch [849] Speed: 572.482835 samples/sec accuracy=0.000959 lr=0.176687 Epoch[0] Batch [899] Speed: 611.510812 samples/sec accuracy=0.000969 lr=0.187080 Epoch[0] Batch [949] Speed: 585.310555 samples/sec accuracy=0.000970 lr=0.197474 Epoch[0] Batch [999] Speed: 586.269362 samples/sec accuracy=0.000970 lr=0.207867 Epoch[0] Batch [1049] Speed: 584.871140 samples/sec accuracy=0.000973 lr=0.218260 Epoch[0] Batch [1099] Speed: 580.345403 samples/sec accuracy=0.000976 lr=0.228654 Epoch[0] Batch [1149] Speed: 604.746532 samples/sec accuracy=0.000979 lr=0.239047 Epoch[0] Batch [1199] Speed: 425.625182 samples/sec accuracy=0.000976 lr=0.249440 Epoch[0] Batch [1249] Speed: 673.577257 samples/sec accuracy=0.000977 lr=0.259834 Traceback (most recent call last): File "train_imagenet.py", line 738, in
main()
File "train_imagenet.py", line 734, in main
train(context)
File "train_imagenet.py", line 710, in train
err_top1_val, err_top5_val = test(ctx, val_data, epoch)
File "train_imagenet.py", line 439, in test
ignore_first_two_cs=opt.ignore_first_two_cs)
File "/data3/wangzhaoming/Single-Path-One-Shot-NAS-MXNet/oneshot_nas_network.py", line 248, in random_channel_mask
channel_choice = random.randint(channel_scale_start, len(self.candidate_scales) - 1)
File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 222, in randint
return self.randrange(a, b+1)
File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 200, in randrange
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (68,10, -58)