Open Amberrferr opened 2 years ago
The learning rate of SGD depends on the batch size. If your batch size is bs, then the lr = 0.01/8 * bs.
The learning rate of SGD depends on the batch size. If your batch size is bs, then the lr = 0.01/8 * bs.
Okay, Thanks for you reply!
我想请问一下,根据get start.txt,config的optimizer中的学习率是根据8GPUS设置的,即Batch size为2的情况下,8GPUs的lr为0.01。为何get start.txt中说0.01对应的是4GPUS? 如果我用的是单核GPU,那么我的学习率应该设置为0.00125还是0.0025呢?