JiahuiYu / slimmable_networks

Slimmable Networks, AutoSlim, and Beyond, ICLR 2019, and ICCV 2019
Other
914 stars 131 forks source link

reproducing CIFAR10 results for AutoSlim #40

Open RudyChin opened 4 years ago

RudyChin commented 4 years ago

Hi Jiahui,

Thanks for the great work. I'm trying to reproduce AutoSlim for CIFAR-10 (Table 2). Could you please provide a detailed hyperparameter you used for it?

I'm able to train the baseline MobileNetV2 1.0x to 7.9 Top-1 error using the following hyperparameters:

To train AutoSlim, I use MobileNetV2 1.5x with the exact same hyperparameter but only trained for 50 epochs on a training set (80% of the real training set). Then, during greedy slimming, I use the extra 20% training set as a validation set to decide channel counts. For greedy slimming, I shrink each layer by a step of 10%, which makes it 10 groups as mentioned in the paper.

The final architecture is trained with the same hyperparameters listed above. But I failed to obtain Top-1 error 6.8% as reported in the paper. I'm getting around 7.8%.

Could you please share with me the final architecture for AutoSlim-MobileNetV2 CIFAR-10 with 88MFLOPs? Also, it would be great if you can let me know the hyperparameters you used for CIFAR experiments.

Thanks, Rudy

dada-thu commented 4 years ago

Hi Jiahui,

Thanks for the great work. I'm trying to reproduce AutoSlim for CIFAR-10 (Table 2). Could you please provide a detailed hyperparameter you used for it?

I'm able to train the baseline MobileNetV2 1.0x to 7.9 Top-1 error using the following hyperparameters:

  • 0.1 initial learning rate
  • linear learning rate decay
  • 128 batch size
  • 300 epochs of training
  • 5e-4 weight decay
  • 0.9 nesterov momentum
  • no label smoothing
  • no weight decay for bias and gamma

To train AutoSlim, I use MobileNetV2 1.5x with the exact same hyperparameter but only trained for 50 epochs on a training set (80% of the real training set). Then, during greedy slimming, I use the extra 20% training set as a validation set to decide channel counts. For greedy slimming, I shrink each layer by a step of 10%, which makes it 10 groups as mentioned in the paper.

The final architecture is trained with the same hyperparameters listed above. But I failed to obtain Top-1 error 6.8% as reported in the paper. I'm getting around 7.8%.

Could you please share with me the final architecture for AutoSlim-MobileNetV2 CIFAR-10 with 88MFLOPs? Also, it would be great if you can let me know the hyperparameters you used for CIFAR experiments.

Thanks, Rudy

Hi, Rudy, when I greedy slimming the network, I found that the output_channels of SlimmableConv2d didn't change. Did you encounter the same problem?

RudyChin commented 4 years ago

Hi dada,

I've actually implemented the AutoSlim myself and cross-referenced this code.

I could be wrong but I actually notice some lines of code that I believe to be bugs:

dada-thu commented 4 years ago

Hi, Rudy, Thank you for your reply!

I did encounter some problems when running the code at v3.0.0, when I run

python -m torch.distributed.launch train.py app:apps/autoslim_resnet_train_val.yml

and I have set autoslim_resnet_train_val.yml autoslim: True

But SlimmableConv2d has no definition of us , so in function get_conv_layers, the length of layers is zero.

So it prints Totally 0 layers to slim.

Do I need to replace SlimmableConv2d with USConv2d in the network?

zhengyujie commented 4 years ago

Hi Jiahui,

Thanks for the great work. I'm trying to reproduce AutoSlim for CIFAR-10 (Table 2). Could you please provide a detailed hyperparameter you used for it?

I'm able to train the baseline MobileNetV2 1.0x to 7.9 Top-1 error using the following hyperparameters:

  • 0.1 initial learning rate
  • linear learning rate decay
  • 128 batch size
  • 300 epochs of training
  • 5e-4 weight decay
  • 0.9 nesterov momentum
  • no label smoothing
  • no weight decay for bias and gamma

To train AutoSlim, I use MobileNetV2 1.5x with the exact same hyperparameter but only trained for 50 epochs on a training set (80% of the real training set). Then, during greedy slimming, I use the extra 20% training set as a validation set to decide channel counts. For greedy slimming, I shrink each layer by a step of 10%, which makes it 10 groups as mentioned in the paper.

The final architecture is trained with the same hyperparameters listed above. But I failed to obtain Top-1 error 6.8% as reported in the paper. I'm getting around 7.8%.

Could you please share with me the final architecture for AutoSlim-MobileNetV2 CIFAR-10 with 88MFLOPs? Also, it would be great if you can let me know the hyperparameters you used for CIFAR experiments.

Thanks, Rudy

Hi, Rudy Can you show me the code of MobilenetV2 on CIFAR-10?

JiahuiYu commented 4 years ago

Hi All,

Sorry for the late reply. While I fully understand ImageNet requires more compute which researchers may not have, the results on CIFAR are usually misleading for Neural Architecture Search especially for efficient neural networks. That's part of the reason why I didn't include the CIFAR config in this code. But I can post the configs here for your reference:

num_hosts_per_job: 1  # number of hosts each job need
num_cpus_per_host: 36  # number of cpus each job need
memory_per_host: 380  # memory requirement each job need
gpu_type: 'nvidia-tesla-p100'

app:
  # data
  dataset: cifar10
  dataset_id: 0
  dataset_dir: /home/jiahuiyu/.git/mobile/data
  data_transforms: cifar10_basic
  data_loader: cifar10_basic
  data_loader_workers: 36
  drop_last: False

  # info
  num_classes: 10
  test_resize_image_size: 32
  image_size: 32
  topk: [1]
  num_epochs: 100

  # optimizer
  optimizer: sgd
  momentum: 0.9
  weight_decay: 0.0001
  nesterov: True

  # lr
  lr: 0.1
  lr_scheduler: multistep
  multistep_lr_milestones: [30, 60, 90]
  multistep_lr_gamma: 0.1

  # model profiling
  profiling: [gpu]

  # pretrain, resume, test_only
  test_only: False

  # seed
  random_seed: 1995

  # model
  reset_parameters: True

  # app defaults
  optimizer: mobile_sgd
  num_gpus_per_host: 8
  batch_size_per_gpu: 128
  distributed: True
  distributed_all_reduce: True
  num_epochs: 250
  slimmable_training: True
  calibrate_bn: True
  inplace_distill: True
  cumulative_bn_stats: True
  bn_cal_batch_num: 32  # effective batch num is batch_num/gpu_num
  num_sample_training: 4
  lr: 0.5
  lr_scheduler: linear_decaying
  lr_warmup: True
  lr_warmup_epochs: 5

run:
  shell_command: "'python -m torch.distributed.launch --nproc_per_node={} --nnodes={} --node_rank={} --master_addr={} --master_port=2234 train.py'.format(nproc_per_node, nnodes, rank, master_addr)"
  jobs:
    # - name: mobilenet_v1_0.2_1.1_nonuniform_50epochs_dynamic_divisor12
      # app_override:
        # model: models.us_mobilenet_v1
        # width_mult_list_test: [0.2, 1.1]
        # width_mult_range: [0.2, 1.1]
        # universally_slimmable_training: True
        # nonuniform: True
        # num_epochs: 50
        # dataset: cifar10_val5k
        # inplace_distill: True
        # dynamic_divisor: 12
        # nonuniform_diff_seed: True
        # # lr: 1.5
        # # batch_size_per_gpu: 48
        # # num_hosts_per_job: 8
        # lr: 0.125
        # batch_size_per_gpu: 32
        # num_hosts_per_job: 1
        # data_loader_workers: 4
        # # num_gpus_per_host: 1

    # - name: mobilenet_v2_0.15_1.5_nonuniform_50epochs_dynamic_divisor12
      # app_override:
        # model: models.us_mobilenet_v2
        # width_mult_list_test: [0.15, 1.5]
        # width_mult_range: [0.15, 1.5]
        # universally_slimmable_training: True
        # nonuniform: True
        # num_epochs: 50
        # dataset: cifar10_val5k
        # inplace_distill: True
        # dynamic_divisor: 12
        # nonuniform_diff_seed: True
        # lr: 0.5
        # batch_size_per_gpu: 128
        # num_hosts_per_job: 1
        # data_loader_workers: 4

    # - name: mnasnet_0.15_1.5_nonuniform_50epochs_dynamic_divisor12_ngc
      # app_override:
        # model: models.us_mnasnet
        # width_mult_list_test: [0.15, 1.5]
        # width_mult_range: [0.15, 1.5]
        # universally_slimmable_training: True
        # nonuniform: True
        # batch_size_per_gpu: 32
        # num_epochs: 50
        # dataset: imagenet1k_val50k_lmdb
        # inplace_distill: True
        # dynamic_divisor: 12
        # nonuniform_diff_seed: True
        # # lr: 2.0
        # # batch_size_per_gpu: 64
        # # lr: 1.
        # # num_hosts_per_job: 8
        # lr: 0.125
        # num_hosts_per_job: 1
        # dataset_dir: /data/imagenet
        # data_loader_workers: 4
JiahuiYu commented 4 years ago

Please also note that the latest version is released under branch v3.0.0, instead of master branch.

JiahuiYu commented 4 years ago

(I am keeping this issue open and marking it as good first issue)

tfwang08 commented 3 years ago

Hi, Rudy, Thank you for your reply!

I did encounter some problems when running the code at v3.0.0, when I run

python -m torch.distributed.launch train.py app:apps/autoslim_resnet_train_val.yml

and I have set autoslim_resnet_train_val.yml autoslim: True

But SlimmableConv2d has no definition of us , so in function get_conv_layers, the length of layers is zero.

So it prints Totally 0 layers to slim.

Do I need to replace SlimmableConv2d with USConv2d in the network?

Hi, I also encountered the same problem, how did you solve it?