Closed KarinHellevik closed 1 month ago
Please provide more detail. Did you install cuda and cudnn? How? What kind of GPU are you using?
from pip list
jaxlib 0.3.22+cuda11.cudnn82
the GPUs from our HPC documentation are "Tesla V100 GPUs with the NVLINK interconnect"
So it sounds like you didn't install CUDA of CUDNN. You either have to install those globally or use the conda install method in the keypoint moseq docs.
I was not able to use conda to install keypoint moseq. I get this error ` LibMambaUnsatisfiableError: Encountered problems while solving:
Could not solve for environment specs The following package could not be installed └─ jaxlib 0.3.22 cuda is not installable because it requires └─ __cuda, which is missing on the system.`
And these are the CUDA toolkit and cuDNN modules available on the HPC system
Hmm are you using conda or mamba to install for the conda route?
And great so in principle the pip should work. Among those cuda modules that are available, which did you actually load?
using conda
im loading cuDNN 8.1 and cuda 11.2 toolkit
Hmm its possible you need cudnn 8.2 or higher. Also when do you get the above error? In general, please provide as much detail as possible when posting and commenting on issues. I'd also recommend that you google the issue and search for related issue posts on this repo.
This post could be helpful for the conda install https://stackoverflow.com/questions/74836151/nothing-provides-cuda-needed-by-tensorflow-2-10-0-cuda112py310he87a039-0
I installed Keypoint-Moseq using
pip
into my folder on the HPC (could not get theconda
method of installation to work, was unable to find_cuda
) and I have this error:2024-07-16 13:54:47.379415: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/local/apps/python37/lib 2024-07-16 13:54:47.418905: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/local/apps/python37/lib 2024-07-16 13:54:47.421869: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/local/apps/python37/lib 2024-07-16 13:54:51.957920: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version 2024-07-16 13:54:51.958518: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. /cm/local/apps/uge/var/spool.p6444/bamgpu03/job_scripts/7980240: line 10: 2253365 Aborted (core dumped) python kpmsmodelfitandreindex.py