google-research / tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.
Other
26.29k stars 2.18k forks source link

All benefits of using a larger batch size assume the training throughput increases? #32

Closed SimLif closed 1 year ago

SimLif commented 1 year ago
  • All benefits of using a larger batch size assume the training throughput increases. If it doesn't, fix the bottleneck or use the smaller batch size.
  • Gradient accumulation simulates a larger batch size than the hardware can support and therefore does not provide any throughput benefits. It should generally be avoided in applied work.

Is a more stable gradient descent guaranteed by adding batch size?
In which scenarios should the gradient accumulation method be used?