facebookresearch / suncet

Code to reproduce the results in the FAIR research papers "Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples" https://arxiv.org/abs/2104.13963 and "Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations" https://arxiv.org/abs/2006.10803
MIT License
486 stars 67 forks source link

why the label are all some when training? #20

Closed happyxuwork closed 3 years ago

happyxuwork commented 3 years ago

the idea in your paper is amazing,great truths are all simple. I have the following questions: 1、why the labels of each Iteration are same(support images are sampled with ClassStratifiedSampler meaning every sampling the same class and have the same class order?)https://github.com/facebookresearch/suncet/blob/731547d727b8c94d06c08a7848b4955de3a70cea/src/paws_train.py#L167 2、The size of the mutil-cropp image is 96 x 96, and the FC layer exists on the network. Why can the loss of the mutil-cropp image be propagated backward? 3、Have you considered label loss plus unlable loss as the final loss during training? In this way, finetune is not required.

MidoAssran commented 3 years ago

Hi @happyxuwork, thanks for your interest.

  1. The order of the labels can change, but the samples sampled with ClassStratifiedSampler in the support mini-batch always follows this pattern [[a,b,c,…], [a,b,c,…], …, [a,b,c,…]] where a,b,c are images from classes a,b,c respectively. While the classes represented by a,b,c change from one iteration to the next, for all intents and purposes, we can just fix the one hot label matrix at the start, since it’s just used to identify which samples are the same, and which are different.

  2. The FC layer is just the prediction head (see here). Even though the small crops are 96x96, their representation size before the prediction head is the same as that of the large crops (e.g., 2048-dimensional for RN50), so you can still feed it into the projection head, no problem.

  3. I haven't tried this with the PAWS loss, but I think it sounds interesting! As one possible alternative to fine-tuning, we just looked at soft nearest-neighbours classification, but as you said, if you have a supervised loss (and a supervised head) during pre-training, then you can directly examine the prediction accuracy of the supervised head on the validation set. Though I suspect you may still get a performance boost by fine-tuning this supervised head.

happyxuwork commented 3 years ago

@MidoAssran if more label can be used in the ImageNet, for example 20% label data can be used, do you have any suggestions to imporve the performance? or some losses can be added? From your point of view, what's the difficulty in reaching the level of full supervision with 20 percent of the data?

MidoAssran commented 3 years ago

@happyxuwork Hi sorry for the delay getting back to you! Was on vacation :)

Yes I think using more labels in support, if available, will directly improve performance. See Fig.7 in appendix B in the paper.

I haven't tried using 20% of labels, but we see that by using wider ResNets (e.g., ResNet50 4x), we can already match fully supervised performance (without extra tricks like AutoAugment, etc.) with only 10% of labels (see Fig.6 in appendix B). Off the top of my head, I'm not sure what the "main difficulty" is, but I think there is certainly room for improvement, since performance with 1% labels is still significantly lower than performance with 10% labels.