clovaai / CutMix-PyTorch

Official Pytorch implementation of CutMix regularizer
MIT License
1.22k stars 159 forks source link

About the probability of applying CutMix #30

Closed JiyueWang closed 3 years ago

JiyueWang commented 4 years ago

It seems that the 'cutmix_prob' is an important hyper-parameter yet you did not mention in the paper. May I ask the value you adopted to get the results in the paper?

hellbell commented 4 years ago

@JiyueWang For the best performance, cutmix_prob=0.5 is used for CIFAR experiments, otherwise, we set cutmix_prob=1.0 including ImageNet experiments.

JiyueWang commented 4 years ago

Thanks for your instant reply.

JiyueWang commented 4 years ago

I notice that this hyper-parameter is also applicable to mixup. As this paper 'WeMix: How to Better Utilize Data Augmentation' showed, the mix of original data and augmented data can help training significantly. I think the comparison of your method with this extra tuned parameter against Mixup on CIFAR is not fair. What do you think?

hellbell commented 4 years ago

@JiyueWang We did search hyper-parameter alpha for Mixup only at the ICCV submission, but maybe tuning the mixup_prob will boost mixup's performance also in CIFAR. (As an excuse, Cutmix still show better performance than mixup without searching prob) Indeed we are preparing the extended journal version of Cutmix, so we will rigorously find the best hyper-parameters alpha and prob for both mixup and cutmix on CIFAR for the more fair comparison. Thank you :)

JiyueWang commented 4 years ago

Have you ever try the combination of Mixup and Cutmix. It seems that they are kind of orthogonal.

hellbell commented 4 years ago

@JiyueWang In my experience, a combination of Mixup and Cutmix was okay but I cannot remember the numbers. Indeed that kind of combination is called cutmixup as in another paper (https://arxiv.org/abs/2004.00448), also this repogitory do a similar thing (https://github.com/rwightman/pytorch-image-models/pull/218)