facebookresearch / moco-v3

PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057
Other
1.21k stars 161 forks source link

About the learning rate for resnet-50 #34

Open cswaynecool opened 2 years ago

cswaynecool commented 2 years ago

I met an issue training resnet-50 with moco-v3. Under the distributed training setting with 16 V100 GPUs (each process only has one gpu, batch size 4096), I can get the training loss at about 27.2 in the 100-th epoch. When I lower the learning to 1.5e-4 (the default one is 0.6), the loss decreases more resonably and it reaches 27.0 in the 100-th epoch. Could you please verify if this is reasonable.

cswaynecool commented 2 years ago

It seems that the training process is hardly convergent under the default learning rate.