Open mbelouso opened 1 year ago
Hi,
That's interesting, what error message were you getting? Normally CUDA major releases should be interoperable.
Best, Kiarash.
The error I got was the following
NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at Start Locally | PyTorch 14
I had exactly the same issues with my A100 cards. The above command fixed it. Thanks @mbelouso
Also, first model-angelo installed without any error, but when I ran the job it was really slow. It turned out that the cpu version of pytorch was installed, due to missing environmental variables for CUDA (CUDA_HOME,CUDA_LIB and PATH to bin ...). To check if one has the cuda or much slower cpu branch of pytorch, run this:
conda list | grep "^pytorch " | grep -E 'cuda|cpu'
Best, Jesper
Dear @mbelouso and @jelka71
I would like to thank @mbelouso for first noticing the issue and offering a fix. I have now pinned this issue. I am thinking about the best way to perhaps automate this CUDA mismatch in the installation script. If any of you have ideas, please let me know.
Best, Kiarash.
Initially the default pytorch isn't compatible with an RTX3080 GPU. So I edited the install.sh script:
Line 45 for my system should have read:
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cudatoolkit=11.7 -c pytorch-nightly -c nvidia
After doing this it works fine. Before the version of PyTorch wasn't compatible with the GPU.
Just a heads up for anyone else with a 3000 series card.