influence of the batch size and the number of GPUs

PistonY / ModelZoo.pytorch

Hands on Imagenet training. Unofficial ModelZoo project on Pytorch. MobileNetV3 Top1 75.64🌟 GhostNet1.3x 75.78🌟

MIT License

49 stars 4 forks source link

influence of the batch size and the number of GPUs #1

Open he-y opened 4 years ago

he-y commented 4 years ago

Thanks for your great work! Could you please share the influence of the batch size and the number of GPUs? Also how to choose a suitable learning rate and batch size if the available GPUs is not enough. Thank you!

PistonY commented 4 years ago

Recently I use distribute train more often. You need to make sure single gpu has same batch size with me, you should get same result but may take more time if you have less gpu.

he-y commented 4 years ago

Thanks for your reply. I understand that a single gpu should has the same batch size (128) as yours. I have a question about the learning rate. Does the learning rate need to be changed?

Based on the Linear Scaling Rule in the paper(Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour), the learning rate should be changed according to the batch size.

Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k.

Thank you very much!

PistonY commented 4 years ago

No need to change I think. This paper should mean batch size on one device, normally batch size in paper just mean on device hold, take care of the difference between DistributedDataParallel and DataParallel.