I met an issue training resnet-50 with moco-v3. Under the distributed training setting with 16 V100 GPUs (each process only has one gpu, batch size 4096), I can get the training loss at about 27.2 in the 100-th epoch. When I lower the learning to 1.5e-4 (the default one is 0.6), the loss decreases more resonably and it reaches 27.0 in the 100-th epoch. Could you please verify if this is reasonable.
I met an issue training resnet-50 with moco-v3. Under the distributed training setting with 16 V100 GPUs (each process only has one gpu, batch size 4096), I can get the training loss at about 27.2 in the 100-th epoch. When I lower the learning to 1.5e-4 (the default one is 0.6), the loss decreases more resonably and it reaches 27.0 in the 100-th epoch. Could you please verify if this is reasonable.