facebookresearch / mixup-cifar10

mixup: Beyond Empirical Risk Minimization
Other
1.16k stars 225 forks source link

some problem while using batch normalization #10

Closed ZoengMingWong closed 5 years ago

ZoengMingWong commented 5 years ago

resnet18_10_25_1 resnet_10_29_1

Thank you for your contribution. But I have some problem when I reimplement the project in Tensorflow and apply the BN layer with tf.layers.batch_normalization(input, training). Just like the Pytorch that we should set model.train() and model.eval() to distinguish the different stages while using the BN layer, I set the "training" to TRUE during training while FALSE during validation.

Look at the two images above. The second one is the result I distinguish the TRAINING and VALIDATION stages while using BN layers, you can find the accuracy so instable and not so well. So I set the TRAINING_FLAG of BN layers all to TRUE, using the batch_mean and batch_var while testing, not the moving ones, and the result is shown in the first image. You can notice that the curve is more smooth, although the accuracy (about 92%) is lower than the paper. I have reduced the learning rate. It also reemerged when I remove the MIXUP module. I am no user if I apply the BN layers mistakenly, but I just do it according to the document. If the data augmentation has altered the distribution between the training data and testing data?

Hope for your answer! Thanks!