Closed Jiawei-Yang closed 2 years ago
Hi, although I have not experimented without a nonlinear projection head, I am almost sure that the performance of MoCo/Exemplar will drop a lot without a nonlinear projection head according to the original paper of SimCLR and MoCo. The performance of pure MoCo/Exemplar on miniImageNet is given in the table in the README.md. Please refer to it. Note that Exemplar is a weakly supervised method, and the performance gap is 7% for 1shot and 4% for 5shot. Recent experiments show that some non-contrastive unsupervised methods like DINO[1] perform even much better, especially for cross-domain few-shot learning from ImageNet towards datasets like ISIC, QuickDraw, etc.
[1] Emerging Properties in Self-Supervised Vision Transformers. ICCV 2021.
Thanks for your quick reply!
I missed the performance about MoCo in the README.md. Thanks for pointing that out.
PS: why would you define Exemplar as a weakly supervised method? Is it because that it does not explictly predict the class-label but use the label information to filter out the true negatives for contrasting?
Yes, that's exactly what I mean in an informal way, different from the traditional meaning of "weakly supervised". :)
Thank you for your fast responses and this discussion! xD
Hi authors, thanks for your wonderful work!
I have some questions regarding the projection head in MoCo/Exemplar. Have you experimented with the pre-trained MoCo/Exemplar with and without a nonlinear projection head? How did they perform? Also, in the code, you commented that
Surprisingly, pure contrastive-pretrained model performs very well on Few-Shot Learning.
Is there any possibility for you to remember the approximate performance gap, i.e., pure MoCo pre-trained v.s. Exemplar pre-trained?