Open thomassajot opened 4 months ago
Reading further the Pytorch doc:
autocast should wrap only the forward pass(es) of your network, including the loss computation(s). Backward passes under autocast are not recommended. Backward ops run in the same type that autocast used for corresponding forward ops.
The actual implementation of pytorch-lightning is actually accurate.
However, it would be great to find a way to disable the autocast during the backward pass rather than re-initialising autocast at every forward.
Bug description
The autocast argument
cache_enabled=True
is actually not caching the layer weights when using a Trainer.What version are you seeing the problem on?
v2.2
How to reproduce the bug
Error messages and logs
The above training scripts produces the following trace, where there are 3 calls to
aten:to
before the linear layer (one for the input, weight and bias). The second linear layer has only 2 calls toaten:to
as the input is already in the right dtype.What should be expected is one (or 0) call to
aten:to
as the weights should be cached into the right dtype, example:Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
Looking at the code base,
autocast
is used with its default value tocache_enabled=True
. Not sure why the cache would not be used.