I want to disable all-reduce during gradient accumulation. If my gradient accumulation is 2, I want to enable all reduce every other step. This will speed up my training.
Using this technique with Apex results in out of sync master gradients and the model does not converge well.
I want to disable all-reduce during gradient accumulation. If my gradient accumulation is 2, I want to enable all reduce every other step. This will speed up my training.
Using this technique with Apex results in out of sync master gradients and the model does not converge well.
Detailed blog: https://krishansubudhi.github.io/deeplearning/2020/02/06/apex-gradient-accumulation.html