Why requires_grad attribute of weight from offloading will set to False ?

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Apache License 2.0

1.82k stars 305 forks source link

Why requires_grad attribute of weight from offloading will set to False ? #996

Open Sakura-gh opened 2 months ago

Sakura-gh commented 2 months ago

https://github.com/NVIDIA/TransformerEngine/blob/e3bb24e5a347c58353e62307bc84cca856f9e9be/transformer_engine/pytorch/module/linear.py#L405-L407

if the weight.requires_grad set to False, when to calculate and accumulate weight grads?

ksivaman commented 2 months ago

This is a bug, see #1026 for further details!