Closed cavalleria closed 4 years ago
Hi, thanks for your interest. Could you please give --train-constraint-method random
a try? I used to find that using evolution constraints from the beginning is hard to converge. What I did before was to train the supernet without constraints/ with random constraints for 30/60 epochs then use evolution constraints for the rest. Please feel free to let me know whether it helps.
I tried the evolution constraints for 2 epochs, please refer to the below log.
Namespace(batch_norm=False, batch_size=64, block_choices='0, 0, 3, 1, 1, 1, 0, 0, 2, 0, 2, 1, 1, 0, 2, 0, 2, 1, 3, 2', channel_choices='6, 5, 3, 5, 2, 6, 3, 4, 2, 5, 7, 5, 4, 6, 7, 4, 4, 5, 4, 3', channels_layout='OneShot', crop_ratio=0.875, cs_warm_up=False, data_dir='~/.mxnet/datasets/imagenet', dtype='float16', epoch_start_cs=0, flop_param_method='lookup_table', hard_weight=0.5, ignore_first_two_cs=False, input_size=224, label_smoothing=True, last_conv_after_pooling=True, last_gamma=False, log_interval=50, logging_file='./logs/shufflenas_supernet+_wc.log', lr=0.65, lr_decay=0.1, lr_decay_epoch='40,60', lr_decay_period=0, lr_mode='cosine', mixup=False, mixup_alpha=0.2, mixup_off_epoch=0, mode='imperative', model='ShuffleNas', momentum=0.9, no_wd=True, num_epochs=120, num_gpus=1, num_workers=16, rec_train='/home/alex/imagenet/rec/train.rec', rec_train_idx='/home/alex/imagenet/rec/train.idx', rec_val='/home/alex/imagenet/rec/val.rec', rec_val_idx='/home/alex/imagenet/rec/val.idx', reduced_dataset_scale=1, resume_epoch=0, resume_params='', resume_states='', save_dir='params_shufflenas_supernet+_wc', save_frequency=10, teacher=None, temperature=20, train_bottom_constraints='flops-190-params-2.8', train_constraint_method='evolution', train_upper_constraints='flops-330-params-5.0', use_all_blocks=False, use_all_channels=False, use_gn=False, use_pretrained=False, use_rec=True, use_se=True, warmup_epochs=5, warmup_lr=0.0, wd=4e-05)
Epoch[0] Batch [49] Speed: 267.692524 samples/sec accuracy=0.000937 lr=0.000325
Epoch[0] Batch [99] Speed: 420.260783 samples/sec accuracy=0.000625 lr=0.000649
Epoch[0] Batch [149] Speed: 434.132753 samples/sec accuracy=0.000625 lr=0.000974
Epoch[0] Batch [199] Speed: 455.834203 samples/sec accuracy=0.000625 lr=0.001299
Epoch[0] Batch [249] Speed: 444.076933 samples/sec accuracy=0.000812 lr=0.001624
Epoch[0] Batch [299] Speed: 446.958638 samples/sec accuracy=0.000937 lr=0.001948
Epoch[0] Batch [349] Speed: 440.735658 samples/sec accuracy=0.000937 lr=0.002273
Epoch[0] Batch [399] Speed: 442.374003 samples/sec accuracy=0.000937 lr=0.002598
Epoch[0] Batch [449] Speed: 435.325226 samples/sec accuracy=0.001007 lr=0.002922
Epoch[0] Batch [499] Speed: 439.740531 samples/sec accuracy=0.000969 lr=0.003247
Epoch[0] Batch [549] Speed: 449.363078 samples/sec accuracy=0.000966 lr=0.003572
Epoch[0] Batch [599] Speed: 427.282463 samples/sec accuracy=0.000964 lr=0.003897
Epoch[0] Batch [649] Speed: 439.999006 samples/sec accuracy=0.000937 lr=0.004221
Epoch[0] Batch [699] Speed: 454.338982 samples/sec accuracy=0.000915 lr=0.004546
Epoch[0] Batch [749] Speed: 442.066367 samples/sec accuracy=0.000854 lr=0.004871
Epoch[0] Batch [799] Speed: 447.217162 samples/sec accuracy=0.000879 lr=0.005195
Epoch[0] Batch [849] Speed: 418.756385 samples/sec accuracy=0.000864 lr=0.005520
Epoch[0] Batch [899] Speed: 430.115587 samples/sec accuracy=0.000868 lr=0.005845
Epoch[0] Batch [949] Speed: 422.384265 samples/sec accuracy=0.000872 lr=0.006170
Epoch[0] Batch [999] Speed: 442.137708 samples/sec accuracy=0.000937 lr=0.006494
...
Epoch[0] Batch [19799] Speed: 434.382863 samples/sec accuracy=0.010476 lr=0.128586
Epoch[0] Batch [19849] Speed: 442.456485 samples/sec accuracy=0.010524 lr=0.128910
Epoch[0] Batch [19899] Speed: 431.092918 samples/sec accuracy=0.010570 lr=0.129235
Epoch[0] Batch [19949] Speed: 445.330133 samples/sec accuracy=0.010624 lr=0.129560
Epoch[0] Batch [19999] Speed: 444.129423 samples/sec accuracy=0.010666 lr=0.129884
[Epoch 0] training: accuracy=0.010680
[Epoch 0] speed: 437 samples/sec time cost: 3014.399720
[Epoch 0] validation: err-top1=0.966212 err-top5=0.888407
Epoch[1] Batch [49] Speed: 441.229930 samples/sec accuracy=0.030937 lr=0.130326
Epoch[1] Batch [99] Speed: 431.210921 samples/sec accuracy=0.029844 lr=0.130651
Epoch[1] Batch [149] Speed: 451.693710 samples/sec accuracy=0.028542 lr=0.130975
Epoch[1] Batch [199] Speed: 453.126118 samples/sec accuracy=0.027344 lr=0.131300
Epoch[1] Batch [249] Speed: 439.301388 samples/sec accuracy=0.027250 lr=0.131625
Epoch[1] Batch [299] Speed: 452.420660 samples/sec accuracy=0.028021 lr=0.131950
Epoch[1] Batch [349] Speed: 456.589121 samples/sec accuracy=0.028705 lr=0.132274
Epoch[1] Batch [399] Speed: 441.290773 samples/sec accuracy=0.028555 lr=0.132599
Epoch[1] Batch [449] Speed: 443.353213 samples/sec accuracy=0.028889 lr=0.132924
Epoch[1] Batch [499] Speed: 455.609001 samples/sec accuracy=0.029063 lr=0.133248
Epoch[1] Batch [549] Speed: 435.873114 samples/sec accuracy=0.029261 lr=0.133573
Epoch[1] Batch [599] Speed: 435.406145 samples/sec accuracy=0.028958 lr=0.133898
Epoch[1] Batch [649] Speed: 432.422730 samples/sec accuracy=0.028990 lr=0.134223
Epoch[1] Batch [699] Speed: 445.527597 samples/sec accuracy=0.028795 lr=0.134547
Epoch[1] Batch [749] Speed: 445.781965 samples/sec accuracy=0.028958 lr=0.134872
Epoch[1] Batch [799] Speed: 437.717070 samples/sec accuracy=0.029004 lr=0.135197
Epoch[1] Batch [849] Speed: 450.319020 samples/sec accuracy=0.028732 lr=0.135521
Epoch[1] Batch [899] Speed: 446.804164 samples/sec accuracy=0.028750 lr=0.135846
Epoch[1] Batch [949] Speed: 448.955765 samples/sec accuracy=0.028766 lr=0.136171
Epoch[1] Batch [999] Speed: 429.807388 samples/sec accuracy=0.028875 lr=0.136496
...
Epoch[1] Batch [19799] Speed: 445.965960 samples/sec accuracy=0.054782 lr=0.258587
Epoch[1] Batch [19849] Speed: 439.394236 samples/sec accuracy=0.054872 lr=0.258912
Epoch[1] Batch [19899] Speed: 431.452251 samples/sec accuracy=0.054946 lr=0.259236
Epoch[1] Batch [19949] Speed: 445.569749 samples/sec accuracy=0.054993 lr=0.259561
Epoch[1] Batch [19999] Speed: 430.832503 samples/sec accuracy=0.055055 lr=0.259886
[Epoch 1] training: accuracy=0.055072
[Epoch 1] speed: 442 samples/sec time cost: 2977.685636
[Epoch 1] validation: err-top1=0.901088 err-top5=0.748339
Epoch[2] Batch [49] Speed: 435.165452 samples/sec accuracy=0.089375 lr=0.260327
Epoch[2] Batch [99] Speed: 438.497906 samples/sec accuracy=0.090313 lr=0.260652
Epoch[2] Batch [149] Speed: 442.370125 samples/sec accuracy=0.088438 lr=0.260977
Epoch[2] Batch [199] Speed: 449.992227 samples/sec accuracy=0.084687 lr=0.261301
Epoch[2] Batch [249] Speed: 451.044595 samples/sec accuracy=0.084187 lr=0.261626
Epoch[2] Batch [299] Speed: 435.895423 samples/sec accuracy=0.083646 lr=0.261951
Epoch[2] Batch [349] Speed: 442.705869 samples/sec accuracy=0.083571 lr=0.262276
Epoch[2] Batch [399] Speed: 431.949651 samples/sec accuracy=0.083086 lr=0.262600
Epoch[2] Batch [449] Speed: 448.379354 samples/sec accuracy=0.083403 lr=0.262925
Epoch[2] Batch [499] Speed: 439.455696 samples/sec accuracy=0.083531 lr=0.263250
Epoch[2] Batch [549] Speed: 419.410924 samples/sec accuracy=0.082812 lr=0.263574
Epoch[2] Batch [599] Speed: 435.331664 samples/sec accuracy=0.082474 lr=0.263899
Epoch[2] Batch [649] Speed: 430.067405 samples/sec accuracy=0.082187 lr=0.264224
Epoch[2] Batch [699] Speed: 456.241039 samples/sec accuracy=0.082388 lr=0.264549
Epoch[2] Batch [749] Speed: 452.860384 samples/sec accuracy=0.081917 lr=0.264873
Epoch[2] Batch [799] Speed: 432.486923 samples/sec accuracy=0.081738 lr=0.265198
Epoch[2] Batch [849] Speed: 450.029449 samples/sec accuracy=0.081801 lr=0.265523
Epoch[2] Batch [899] Speed: 445.616156 samples/sec accuracy=0.081233 lr=0.265847
Epoch[2] Batch [949] Speed: 430.188969 samples/sec accuracy=0.081299 lr=0.266172
Epoch[2] Batch [999] Speed: 430.283522 samples/sec accuracy=0.081641 lr=0.266497
...
Hi, thanks for your interest. Could you please give
--train-constraint-method random
a try? I used to find that using evolution constraints from the beginning is hard to converge. What I did before was to train the supernet without constraints/ with random constraints for 30/60 epochs then use evolution constraints for the rest. Please feel free to let me know whether it helps.
Thanks for your quick reply, I noticed that you modified cs-warm-up = false
and epoch-start-cs = 0,
I modified my training script according to your training log, and run 3 epochs, acc and val top -1 error looks normal. Then I have some questions
1.The README describes supernet training details as follows
The reason why we did this in the supernet training is that during our experiments we found, for supernet without SE, doing Block Selection from beginning works well, nevertheless doing Channel Selection from the beginning will cause the network not converging at all. The Channel Selection range needs to be gradually enlarged otherwise it will crash with free-fall drop accuracy. And the range can only be allowed for (0.6 ~ 2.0). Smaller channel scales will make the network crashing too. For supernet with SE, Channel Selection with the full choices (0.2 ~ 2.0) can be used from the beginning and it converges. However, doing this seems like harming accuracy. Compared to the same se-supernet with Channel Selection warm-up, the Channel Selection from scratch model has been always left behind 10% training accuracy during the whole procedure.
my understanding is that if use_se = true,
channel selection can be used from the beginning and it can converges (epoch-start-cs = 0, cs-warm-up = false),
but left behind 10% training accuracy compared to same se-supernet with channel selection warm-up (epoch-start-cs = 0, cs-warm-up = true
),is it right?
2.if i train the supernet with use-se=true, epoch-start-cs = 0 and cs-warm-up = true
, but can't converges, should I follow --train-constraint-method none / random / evolution (epoch 0 ~ 30/30 ~ 60/60 ~ 120)
to progressively train the supernet.
3.when i use 8 titanx gpus, whether the learning rate should be increased 8 times(8*0.65)
, and i find multi gpu often idling.
thx~
--train-constraint-method none / random / evolution (epoch 0 ~ 30/30 ~ 60/60 ~ 120)
is for how to make evolution constraint work but not for channel selection warm up. Nevertheless, you are welcomed to give it a try.Close the issue for no further response. Please feel free to reopen if necessary.
Hi,thanks for your excellent work! I am preparing to reappear your work,but when trainning supernet, the loss can't converge, and val top-1 error is don't decline. my trainning scripts is
python train_imagenet.py \ --rec-train ~/facedata.mxnet.hot/rec2/train.rec --rec-train-idx ~/facedata.mxnet.hot/rec2/train.idx \ --rec-val ~/facedata.mxnet.hot/rec2/val.rec --rec-val-idx ~/facedata.mxnet.hot/rec2/val.idx \ --mode imperative --lr 0.65 --wd 0.00004 --lr-mode cosine --dtype float16\ --num-epochs 120 --batch-size 64 --num-gpus 1 -j 16 \ --label-smoothing --no-wd --warmup-epochs 5 --use-rec \ --model ShuffleNas \ --epoch-start-cs 60 --cs-warm-up --use-se --last-conv-after-pooling --channels-layout OneShot \ --save-dir params_shufflenas_supernet+ --logging-file ./logs/shufflenas_supernet+.log \ --train-upper-constraints flops-330-params-5.0 --train-bottom-constraints flops-190-params-2.8 \ --train-constraint-method evolution
and when run test ,it will report a error, i change
select_all_channels=True
in line 435 and 440 oftrain_imagenet.py