Hello, @ppwwyyxx @KaimingHe
I use MOCO to train on mnist dataset as an easy example. The mnist train.py is refered from pytorch mnist.
It is easy to reach 99% when directly train with supervised setting.
When I use moco method to pretrain the model firstly, and then I finetune the pretrained weight (here the conv weight is frozen, and only the fc layer can be changed), the performance on the test set can only reach 95%, and could not get better result.
Concretely, when training with mnist dataset, the length of the queue I set is 3840 rather than the default setting 65536. Because the mnist dataset length is smaller than ImageNet.
Does this means the feature extraction network is not trained well, can you give me some suggestions on this phenomenon?
What's more, can you give me some suggestions that how to train on custom dataset? What change is required in hyperparams?
Hello, @ppwwyyxx @KaimingHe
I use MOCO to train on mnist dataset as an easy example. The mnist train.py is refered from pytorch mnist. It is easy to reach 99% when directly train with supervised setting. When I use moco method to pretrain the model firstly, and then I finetune the pretrained weight (here the conv weight is frozen, and only the fc layer can be changed), the performance on the test set can only reach 95%, and could not get better result. Concretely, when training with mnist dataset, the length of the queue I set is 3840 rather than the default setting 65536. Because the mnist dataset length is smaller than ImageNet.
Does this means the feature extraction network is not trained well, can you give me some suggestions on this phenomenon? What's more, can you give me some suggestions that how to train on custom dataset? What change is required in hyperparams?