Closed jambo6 closed 1 week ago
When I set --overlap-grad-allreduce my run fails because gradients are None inside the hook. It then fails due to this code
--overlap-grad-allreduce
if self.ddp_config.overlap_grad_reduce: assert ( param.grad is not None ), 'param.grad being None is not safe when overlap_grad_reduce is True'
Gradients are available in the optimizer step, so its not that I'm just not computing gradients.
When I disable overlap I also find that every gradient is None inside the backwards hook.
Issue on my end
What was the issue? Might be useful for other users.
When I set
--overlap-grad-allreduce
my run fails because gradients are None inside the hook. It then fails due to this codeGradients are available in the optimizer step, so its not that I'm just not computing gradients.
When I disable overlap I also find that every gradient is None inside the backwards hook.