google-research / tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.
Other
26.29k stars 2.18k forks source link

Gradient checkpointing #45

Closed lakshya-4gp closed 5 months ago

lakshya-4gp commented 1 year ago

I believe the gradient checkpointing can be very useful if you have to maintain some minimum batch size, and you can't do that with your hardware. I was training a Nerf model, and i had to use it to atleast maintain some minimum number of samples in a batch. Any thoughts?

znado commented 5 months ago

Do you mean gradient accumulation?

I agree that if you batch size is absolutely minuscule (<4?), then maybe that could improve performance. Although I would want to see some tuned side-by-side studies showing that is the case.