Some questions - Githubissues

Thanks for your interest in our work.

”Will it get better results if we train a model with our own dataset, and then use the pretrained model for CAM?“

Actually, we run the experiments before but not extensively. The results showed that pre-training a model to compute CAM did not further improve the performance. This is because when you fine-tune the network (use ImageNet pre-trained weights) on fine-grained datasets, the model fine-training for a few epochs can get the CAMs as good as using the model trained additionally.

"beta are searched only on one dataset, whether there is the same conclusion on other datasets. (best beta is 5 and snapMix is not very sensitive to beta)"

We only tried some beta values for the CUB dataset and did not specially tune the beta values for any other datasets. Since CUB is the most typical data set for fine-grained classification, some conclusions obtained on this data set are generally applicable to other fine-grained data sets. However, this point needs some more experiments to be further confirmed.

"Can snapmix be mixed use with mixup,cutmix and so on? Or can we use snapmix instead of mixup, or according to experiments on our own datasets?"

Of course, you can combine SnapMix with other methods. However, I would recommend combining with a more different method( e.g., mixup) since SnapMix is essentially a better version of CutMix ( they both use a cut-and-paste mixing strategy).

"Is there anything we need to be cautious about when using snapmix? "

I think the first one is that when using other types of backbone networks, the method of calculating SPM needs to be adjusted appropriately according to the network structure. the network backbone we adopted in experiments mainly used global average pooling, so the calculation of SPM is relatively straightforward as we can directly use the method of CAMs. Nevertheless, if using other network structures such as VGG, we may need to consider other ways to compute SPM.

The second is that in our experiments, we found that if the backbone model is poor in handling label noise, for example, when using shallow network structure (such as Resnet-18) for finetuning, SnapMix performs much better than mixup and CutMix. However, when using a more powerful network structure, the superiority over the mixup and CutMix is relatively small. In other words, SnapMix is more effective than other approaches when applying it to a backbone model that is more sensitive to label noise.

Shaoli-Huang / SnapMix

Some questions #4