Alibaba-MIIL / ImageNet21K

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
MIT License
724 stars 71 forks source link

The appropriate learning rate #5

Closed Stephen-Hao closed 3 years ago

Stephen-Hao commented 3 years ago

@mrT23 Dear sir, when I use the pretrained imagenet21k model and set the lr = 0.01or 0.1 rather than default lr value 0.0003,the Accurancy is very low, such as 0.17% util train dozens of epochs. if I use sgd optimizer with lr=0.01 or 0.1 , the result will be better than using adam with lr=0.0003 about 0.05% on my own dataset, , how to choose the best optimizer and corresponding lr ? Thank you very much!

mrT23 commented 3 years ago

@Stephen-Hao i assume you are talking about transfer learning.

for adam\adamw optimizers, learning rate usually should be in the range 1e-4 to 5e-4 for transfer learning for SGD optimizer, it's harder to nail the "correct" learning rate, it usually depends on the dataset and even the batch size. reasonable range for SGD is 5e-3 to 5e-2.

all the articles' results for transfer learning are with adam optimizer and lr of 2e-4. i believe it should work well also for you. tresnet-m model with imagenet-21k pretraining usually converges very fast, to high accuracy. do sanity checks, and make sure you don't have other bugs in your transfer learning training

Stephen-Hao commented 3 years ago

@mrT23 Thank you very much! I see!

Stephen-Hao commented 3 years ago

@mrT23 Dear Tal, I admire your Tresnet work very much! Now I wander if you could share the code about Kaggle Competition "Plant Pathology 2020 - FGVC7" or "Herbarium 2020 - FGVC7" ? Because I do not kown how to train Soft-Triplet loss for metric learning and CrossEntropy loss for classfication learning in the same backbone, is this a muti-task learning way? will you add the two loss functions and backward loss together,like this : loss1 = softtriplet loss2 = crosentropy loss = loss1 + loss2 loss.backward() I'd be deeply grateful If you can share the code,! Best wishes!

mrT23 commented 3 years ago

@Stephen-Hao just to be clear, this is unrelated to ImageNet-21K pretraining :-) unfortunately i can't share an explicit code due to commercial limitations.

soft triplet is quite complicated, it requires also changing the sampler. in practice for classification its not a game-changer, using better pretraining has more impact. EMA is also crucial for top results.

indeed when we used triplet loss we did:

total_loss= cross_entropy_loss + triplet_loss
total_loss.backward()

example of triplet loss and sampler implementation that you can try: https://github.com/Cysu/open-reid/blob/master/examples/triplet_loss.py https://github.com/Cysu/open-reid/blob/3293ca79a07ebee7f995ce647aafa7df755207b8/reid/utils/data/sampler.py#L11 (also we used something different)

Stephen-Hao commented 3 years ago

@mrT23 Thank you very much!