cvtower / SeesawNet_pytorch

Pytorch(0.4.1/1.0 verified) codes and pre-trained models for the paper: Seesaw-Net: Convolution Neural Network With Uneven Group Convolution.
https://arxiv.org/abs/1905.03672
10 stars 3 forks source link

issue of cifartest #2

Open thinkInJava33 opened 4 years ago

thinkInJava33 commented 4 years ago

There are some problems when i train both IGCV3 and your SeesawNet using your code of model and checkpoint. IGCV3:i was able to resume from the checkpoint you provided but the result is totally wrong.did you change some setting of traning on cifar100. SeesawNet:i can not train your SeesawNet on cifar100(RuntimeError:Given input size: (1280x2x2). Calculated output size: (1280x0x0). Output size is too small),i think maybe the code you provided is not the correct version.

cvtower commented 4 years ago

hi @thinkInJava33 , the model configs for imagenet and cifar datasets are different, and I get correct config from authors of IGCV3. I just upload model file and pre trained model for cifar yesterday, and I will upload training code for cifar later if needed.

thinkInJava33 commented 4 years ago

@cvtower That will help a lot, i trained IGCV3 model using the code you provided and still can not reproduce the result on Cifar-100,only get a acc under 77%.

cvtower commented 4 years ago

@cvtower That will help a lot, i trained IGCV3 model using the code you provided and still can not reproduce the result on Cifar-100,only get a acc under 77%.

hi @thinkInJava33 ,

Sorry for the dealyed relay. I just download the source version from https://github.com/xxradon/IGCV3-pytorch, and compared with my implement version. Well, i found some differences. My opinion is to reproduce the result of igcv3 according to the original paper with least modifications, so i just change @xxradon's code:

  1. line 22 in cifar100data.py:

    transforms.ColorJitter(0.3, 0.3, 0.3),

    seems like that xxrandon add color jitter to cifar100 during training, but i think this should not be enabled since the auther did not mention this within the paper, then i remove this opration.

  2. line 57 in igcv3.py:

    nn.ReLU6(inplace=True),#remove according to paper

below model related differences is what i got the correct configuration for cifar dataset from authers of igcv3:

  1. line 85 in igcv3.py: elif args.downsampling == 4:(not 2 from xxradon's repo)
  2. line 105 in igcv3.py:

    setting of inverted residual blocks

    self.interverted_residual_setting = [
        # t, c, n, s
        [1, 16, 1, 1],
        [6, 24, 4, s2],
        [6, 32, 6, 2],
        [6, 64, 8, 2],
        [6, 96, 6, 1],
        [6, 160, 6, 1],
        [6, 320, 1, 1],
    ]

    and the corresponding setting from @xxradon's repo is wrong like:

    setting of inverted residual blocks

    self.interverted_residual_setting = [
        # t, c, n, s
        [1, 16, 1, 1],
        [6, 24, 4, s2],
        [6, 32, 6, 2],
        [6, 64, 8, 2],
        [6, 96, 6, 1],
        [6, 160, 6, 2],
        [6, 320, 1, 1],
    ]

    I guess you will be able to reproduce the result from the original papers with this. if i remember correctly, i have this issue just when @xxradon paste the repo, and i just made modification according to the original paper and the authers. I will upload whole training code if needed, but my i just rename all files and models in my private versions since i just use igcv3 as a reference model.

cvtower commented 4 years ago

https://github.com/cvtower/SeesawNet_pytorch/tree/master/cifar_test/baseline_version please use this version and ignore other config files of shufflenetv2/mnasnet/half versions(since i just trained all available model hundreds of times on cifar before imagenet training)