Open zbw0329 opened 2 years ago
When I use linear_evaluation to evaluate my model,should I train on main_lincls.py first?
I think you only need to change the dataloader and augmentation to CIFAR-10 and change the num_classes=10. And check if you can load the pretrained weights correctly.
What is the right order of training? Use main.py first to get checkpoint? And than use the main_lincls.py to evaluate the checkpoint of main.py? Or use the checkpoint as resume and train on main_lincls.py to get linear_checkpoint?
I get 37 ACC1 on CIFAR-10 dataset after 300 epochs,which is far from the result of your paper. I use the learning rate of 10,should I make it smaller?
I'm confused about the order in which main.py and main_lincls.py are used
You should, 1. use main.py to get the checkpoint; 2. use main_lincls.py to load the checkpoint as pretrained weights (not resume training).
The results of CIFAR-10 in the paper are produced using ImageNet pretrained weights. I didn't try directly pretraining on CIFAR-10.
Actually, 37% precision of linear evaluation shows the model's weight is not random. It seems the model learns some features, but not that good. The reason may lie in inappropriate hyper-parameters. Or maybe the CIFAR-10 dataset is too easy to learn, which makes the model's outputs became unchanged during the training process.
Oh,I see. I use the main_lincls.py to get the checkpoint and than use it to evaluate. I will retry with the order you offer,thanks for your help.
In the Table 5 of your paper,what is the different between 'finetune' and 'linear'? Is there any difference in their experimental process? Are their assessment methods different?
In "linear", we load the pretrained weights and fix the backbone, then only train a classifier. In "fine-tune" we load the pretrained weights as initialization and train the whole model normally. They are two different ways to measure the quality of pretrained weights.
Thanks for your excellent work! I change the dataloader to use JigClu in CIFAR-10,and train the model on it by 1000epoch. But the prediction of my model is all the same. It seem that model always cluster into the same cluster