Closed ryan-budde closed 4 months ago
I have been using pytorch version 2.2.1. I know I've used at least a few other versions >2.0 over the course of development but don't remember the exact numbers.
I see a number of people with similar pytorch issues related to 12.4 and 12.5 from a quick google search. It looks like some of them were able to fix it by uninstalling their existing cuda and pytorch installation (or starting a fresh environment), then installing the 12.1 toolkit instead. I.e. following the instructions in our read me but change pytorch-cuda=11.8
to pytorch-cuda=12.1
.
I have been using pytorch version 2.2.1. I know I've used at least a few other versions >2.0 over the course of development but don't remember the exact numbers.
I see a number of people with similar pytorch issues related to 12.4 and 12.5 from a quick google search. It looks like some of them were able to fix it by uninstalling their existing cuda and pytorch installation (or starting a fresh environment), then installing the 12.1 toolkit instead. I.e. following the instructions in our read me but change
pytorch-cuda=11.8
topytorch-cuda=12.1
.
Thanks! Question then - are there important specific reasons that you specify cuda 11.8 and python 3.9? I can see the numpy<2.0 was due to a known bug. Are there known bugs on CUDA>11.8 and python >3.9? Or are these simply the ones you used in development, and are known to work?
Those are just the versions used in development. As noted in the readme, python 3.10 should work as well (and we include that version in our testing). Anything outside of 3.9 and 3.10, it might work, but we don't specifically test those right now so there might be some new or deprecated functions that cause errors. There is also a note about determining the correct versions for pytorch and cuda, with a link that might be helpful:
If pytorch installation still fails, follow the instructions here to determine what version of pytorch to install. The Anaconda install is strongly recommended on Windows, and then choose the CUDA version that is supported by your GPU (newer GPUs may need newer CUDA versions > 10.2)
Update: the terminal is recognizing everything and this is the fault of VS code / jupyter. Not a KS issue. Currently working on running the KS example in a simple .py
Describe the issue:
What version of torch/pytorch do y'all use?
I am working on getting KS4 set up a slurm cluster with an A100 that has drivers for 12.4/12.5 (I cannot easily change these, and they should be backward compatible to 11.8). I'm using the 11.8 toolkit and I've followed the dev KS4 install instructions (python 3.9, pytorch-cuda=11.8 etc.). Everything looks right but
torch.cuda.is_available()
always fails.torch.version.cuda
shows 11.8 as expected.nvidia-smi
shows my A100, andnvcc --version
shows 11.8.I'm working on if it's my fault, the cluster's fault, or pytorch's fault, and I want to check a known-working version of pytorch (I ask because it is not specified in the install)
Reproduce the bug:
Error message:
Version information:
My torch says 2.3.1+cu118