Open Sakura-gh opened 2 months ago
https://github.com/NVIDIA/TransformerEngine/blob/e3bb24e5a347c58353e62307bc84cca856f9e9be/transformer_engine/pytorch/module/linear.py#L405-L407
if the weight.requires_grad set to False, when to calculate and accumulate weight grads?
This is a bug, see #1026 for further details!
https://github.com/NVIDIA/TransformerEngine/blob/e3bb24e5a347c58353e62307bc84cca856f9e9be/transformer_engine/pytorch/module/linear.py#L405-L407
if the weight.requires_grad set to False, when to calculate and accumulate weight grads?