The learning rate of linear classification

facebookresearch / swav

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Other

2.01k stars 280 forks source link

The learning rate of linear classification #38

Closed dddzg closed 4 years ago

dddzg commented 4 years ago

Thanks for your awesome work. I wonder why the learning rate is so small in linear classification(0.3 in eval_linear.py)? In the linear classification of MoCo, the initial learning rate is 30 with a two-stage reduction. There is a 100x difference with this repo. Have you ever run the eval_linear.py with moco v2 weights or run swav weights with the code from MoCo? I wonder about the performance impact of the lr.

mathildecaron31 commented 4 years ago

The different methods (moco, swav, etc) result in networks with feature distributions (e.g., magnitudes) which can be very different. That is why we perform learning rate and weight decay grid search and find that for our network lr=0.3 gives the best performance.

dddzg commented 4 years ago

Wow. Thanks for your response. Although I am still surprised that there is a 100x learning rate gap for the linear classification experiments.

mathildecaron31 commented 4 years ago

This is not that surprising given that the two methods are trained with a different loss, different optimizer, different learning rate, different weight decay, etc. There is no reason that the subsequent weight distributions should match.

dddzg commented 4 years ago

Thanks again for your response. Does it indicate that we should be careful with the results of the linear classification of different pre-training models? For example, Table 6 in SwAV paper, there are about a 4% and 10% top-1 gap between MoCo v2 and SwAV in Places205 and inat18. However, In our experiments, we find that the MoCo weight performs badly with low lr linear classification on ImageNet.

is the result in the linear classification in Table 6 conducted with the same lr for SwAV and MoCo?

mathildecaron31 commented 4 years ago

Each method performs its own learning rate grid search to find the best learning rate.

dddzg commented 4 years ago

Thank you so much!