alexrame / mixmo-pytorch

Official Pytorch implementation of MixMo framework
Other
84 stars 16 forks source link

Some problems about baseline #5

Closed ZzBros closed 3 years ago

ZzBros commented 3 years ago

When I was training the baseline network (useing exp_cifar10_wrn2810_1net_standard_bar1.yaml) , I want to know if I understand the following correctly?

  1. mixmo uses random sampling (learning every sample once on one epoch)
  2. mixmo uses DADataset
  3. mixmo uses wideRsenet28-10
  4. mixmo with warmup lr in first epoch and reduce to 0.1 * lr in [101,201,226] epoch
  5. mixmo with L2 regular for net params
  6. mixmo uses the specified initialization

I look forward to your comment if I have missed anything !!! Thanks!!!

alexrame commented 3 years ago

Yes, all this is true! Note that config/cifar10/exp_cifar10_wrn2810_1net_standard_bar1.yaml is a configuration file for a standard baseline network without data augmentation nor ensembling, so should not be called "mixmo" but rather "vanilla". In detail, the critical code sections for the different points are:

  1. vanilla uses random sampling (learning every sample once on one epoch) https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/mixmo/loaders/abstract_loader.py#L62

  2. vanilla uses MSDADataset with msda_mix_method == None, which is strictly equivalent to DADataset https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/mixmo/loaders/abstract_loader.py#L47

  3. vanilla uses wide Resnet28-10 https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/mixmo/networks/wrn.py#L54

  4. vanilla with warmup lr in first epoch (=782 steps) and reduce to 0.1 * lr in [101,201,226] epoch https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/config/cifar10/exp_cifar10_wrn2810_1net_standard_bar1.yaml#L69 https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/mixmo/learners/model_wrapper.py#L80

  5. vanilla with L2 regular for net params https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/config/cifar10/exp_cifar10_wrn2810_1net_standard_bar1.yaml#L52 https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/mixmo/core/loss.py#L176

  6. vanilla uses the specified initialization https://github.com/alexrame/mixmo-pytorch/blob/5a2a1a090b7805212aaad8ebc36ef11ade75e8d2/mixmo/networks/resnet.py#L167

If you have any questions please let me know. Best regards