Adjust learning rate when batch size changes

NVIDIA / DIGITS

Deep Learning GPU Training System

https://developer.nvidia.com/digits

BSD 3-Clause "New" or "Revised" License

4.12k stars 1.38k forks source link

Adjust learning rate when batch size changes #51

Open lukeyeager opened 9 years ago

lukeyeager commented 9 years ago

See discussion in #44.

As Alex Krizhevsky explains in his paper One weird trick for parallelizing convolutional neural networks, the learning rate, momentum and weight decay are all dependent on the batch size (see section 5, page 5). It would be nice if DIGITS handled these calculations for you automatically so that you don't have to worry about it.

The issue is that different networks have different default learning rates and batch sizes. Is there a standard equation that fits all networks?

mrgloom commented 8 years ago

Also discussion about this https://github.com/BVLC/caffe/issues/430

In theory when you reduce the batch_size by a factor of X then you should increase the base_lr by a factor of sqrt(X), but Alex have used a factor of X (see http://arxiv.org/abs/1404.5997)

yxchng commented 7 years ago

@lukeyeager @mrgloom Is this still relevant with recent paper https://arxiv.org/abs/1706.02677 that says that we should take linear scale, i.e. multiply base_lr by x when batch_size changes by x?