Closed shenoynikhil closed 1 year ago
You could use an earlier PyTorch version that does not have that issue until they publish a fix
Sounds good. Feel free to close the issue.
I also had to downgrade pl to pytorch-lightning==1.7.0
for it to work. Was there a change in how cuda backend gets checked in >1.7.0
versions?
Yes. The issue you referenced on the pytorch side is regarding a piece of parsing logic that was introduced in torch >= 1.13. We then took this code over into Lightning to support this new way of parsing also for Lightning users with torch <= 1.13. This is likely the reason why you saw the issue go away when downgrading lightning.
We need to take the fix here and apply it to our code as well in https://github.com/Lightning-AI/lightning/blob/ccd2a481d0fdcf757124e43e58cf0bffc8d68594/src/lightning/fabric/accelerators/cuda.py#L229
For existing pytorch-lightning versions and pytorch (where the change https://github.com/pytorch/pytorch/issues/90543 has not taken place), is there a way to still use GPUs. I can use GPU by doing .to(torch.device('cuda'))
but I want to be able to use pytorch_lightning.
@shenoynikhil #16795 should fix this issue. It will be included in 1.9.4 in 1-2 days. Thanks for the patience!
Bug description
So, in my environment
torch.cuda.is_available()
isTrue
buttorch.cuda.device_count()
is0
. This issue is probably linked with a pytorch issue. Since I was planning on using lightning for a new project, I am unable to use GPU using thepl.Trainer(accelerator='cuda', devices=1)
.Not sure if this is a bug on your end. Any suggestion to go about this would be great.
How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 1.10): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response
cc @tchaton