More efficient DDP syncing with gradient accumulation

facebookresearch / ClassyVision

An end-to-end PyTorch framework for image and video classification

https://classyvision.ai

MIT License

1.59k stars 277 forks source link

More efficient DDP syncing with gradient accumulation #656

Closed mannatsingh closed 3 years ago

mannatsingh commented 3 years ago

Summary: We were unnecessarily syncing the gradients during gradient accumulation steps where we didn't perform an optimizer step. This removes the communication bottleneck at every step, which only happens at the end of every optimizer_period

This should make a significant speed difference, especially when optimizer_period and the total nodes are high

Differential Revision: D24969761

facebook-github-bot commented 3 years ago

This pull request was exported from Phabricator. Differential Revision: D24969761

facebook-github-bot commented 3 years ago

This pull request has been merged in facebookresearch/ClassyVision@0e129dd8a7a8a510f78e3853c9d5e6b13ee4a600.