Closed happyxuwork closed 3 years ago
Hi @happyxuwork, thanks for your interest.
The order of the labels can change, but the samples sampled with ClassStratifiedSampler
in the support mini-batch always follows this pattern
[[a,b,c,…], [a,b,c,…], …, [a,b,c,…]]
where a,b,c are images from classes a,b,c respectively. While the classes represented by a,b,c change from one iteration to the next, for all intents and purposes, we can just fix the one hot label matrix at the start, since it’s just used to identify which samples are the same, and which are different.
The FC layer is just the prediction head (see here). Even though the small crops are 96x96, their representation size before the prediction head is the same as that of the large crops (e.g., 2048-dimensional for RN50), so you can still feed it into the projection head, no problem.
I haven't tried this with the PAWS loss, but I think it sounds interesting! As one possible alternative to fine-tuning, we just looked at soft nearest-neighbours classification, but as you said, if you have a supervised loss (and a supervised head) during pre-training, then you can directly examine the prediction accuracy of the supervised head on the validation set. Though I suspect you may still get a performance boost by fine-tuning this supervised head.
@MidoAssran if more label can be used in the ImageNet, for example 20% label data can be used, do you have any suggestions to imporve the performance? or some losses can be added? From your point of view, what's the difficulty in reaching the level of full supervision with 20 percent of the data?
@happyxuwork Hi sorry for the delay getting back to you! Was on vacation :)
Yes I think using more labels in support, if available, will directly improve performance. See Fig.7 in appendix B in the paper.
I haven't tried using 20% of labels, but we see that by using wider ResNets (e.g., ResNet50 4x), we can already match fully supervised performance (without extra tricks like AutoAugment, etc.) with only 10% of labels (see Fig.6 in appendix B). Off the top of my head, I'm not sure what the "main difficulty" is, but I think there is certainly room for improvement, since performance with 1% labels is still significantly lower than performance with 10% labels.
the idea in your paper is amazing,great truths are all simple. I have the following questions: 1、why the labels of each Iteration are same(support images are sampled with ClassStratifiedSampler meaning every sampling the same class and have the same class order?)https://github.com/facebookresearch/suncet/blob/731547d727b8c94d06c08a7848b4955de3a70cea/src/paws_train.py#L167 2、The size of the mutil-cropp image is 96 x 96, and the FC layer exists on the network. Why can the loss of the mutil-cropp image be propagated backward? 3、Have you considered label loss plus unlable loss as the final loss during training? In this way, finetune is not required.