danczs / NetworkAdjustment

13 stars 2 forks source link

About the performance #1

Open vtddggg opened 4 years ago

vtddggg commented 4 years ago

I run the experiment on CIFAR-100 using Single GPU:

channel_search_distributed.py --dataset=cifar100 --dataset_dir=$DATA_DIR$ --gpu=0 --batch_size=128 --learning_rate=0.15 --arch=resnet_cifar --depth=20 --drop_rate=0.05 --base_drop_rate=0.05

But the result I got is

2020-05-28 20:49:53,457 epoch 19 lr 1.000000e-03
2020-05-28 20:49:53,457 drop rates:
2020-05-28 20:49:53,457 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
2020-05-28 20:49:53,999 train 000 9.374524e-01 72.656250 94.531250
2020-05-28 20:49:56,114 train 100 1.043171e+00 69.477104 92.844988
2020-05-28 20:49:58,226 train 200 1.062567e+00 69.212531 92.502332
2020-05-28 20:50:00,341 train 300 1.066392e+00 69.235361 92.366591
2020-05-28 20:50:01,448 train acc 69.302222
2020-05-28 20:50:01,976 valid 000 1.466250e+00 55.468750 85.937500
2020-05-28 20:50:02,282 valid acc 59.540000
2020-05-28 20:50:02,834 valid 000 1.335375e+00 59.375000 87.500000
2020-05-28 20:50:03,379 test acc 60.660000
2020-05-28 20:50:03,936 valid 000 1.663021e+00 54.687500 84.375000

I found the result reported in paper is 71.57% after search. Is the 71.57% able to be achieved when setting larger epoch and larger search iter?

Another question is after searching channels, model is trained from scratch in the code, can we finetune the model instead of training from scratch?

Thank for your helps

danczs commented 4 years ago

We got the reported results by training the searched model for 200 epochs in the full training set. Note that we sample a validation set from the training set with "train_portion=0.9" during search. The detailed validation code for the searched model will be released soon.

Yeah, fine-tuning may be faster and improve the models. But we aim to search for better architecture and we think training from scratch is a more intuitive evaluation criterion. Thanks for your questions.