Question about the Self-supervised pre-trained models (MoCo)

seekingup commented 4 years ago

Thanks for your exellent code! I reproduced the results of CE(uniform) + SSP and cRT+SSP on ImageNet_LT based on the Self-supervised pre-trained models (MoCo), and got the same results as reported in your paper. But I still have some question about the MoCo SSP checkpoint. I directly evaluated the performance of MoCo checkpoints + cRT (without CE-uniform supervised training), and the accuracy is 0.118, which is not good. But according to the original paper of MoCo, the accuracy of MoCo on full imagenet should be 0.60+, which is not far from supervised learning. So is the 0.118 accuracy reasonable? It's much lower than supervised accuracy on ImageNet_LT.

YyzHarry commented 4 years ago

Good question. I think in this case, your actual problem is to apply MoCo on an imbalanced dataset. There could be many factors. One possible explanation is on hyper-parameters --- for example, the number of pre-training epochs, as I only trained for 200 epochs, longer training might bring better performance. Also, during the fine-tuning stage on linear classifier, you might want to follow the hyper-parameter in MoCo's repo, i.e., the learning rate (they usually use very large lr, e.g., 30, for the linear classifier).

And indeed, the performance might be actually reasonable, indicating that current contrastive self-supervised learning might have large performance drop when facing imbalanced data. This problem is an independent problem, and might be intereting to the self-supervised learning community.

seekingup commented 4 years ago

thank you so much!

jiequancui commented 3 years ago

Hi, I'm very interested in this paper. Recently, I'm trying to reproduce the results on ImageNet-LT with this code. However, I found that the loss value can only decrease from 9.5 to 6.9 for the moco pretraining. Is it correct? Can you post your training log for reference ? Thank you very much.

YyzHarry / imbalanced-semi-self

Question about the Self-supervised pre-trained models (MoCo) #2