Closed EricLBuehler closed 3 months ago
Yeah candle has encountered issues with this as well (linking to #219 and https://github.com/huggingface/candle/issues/2175).
I will update the error message tomorrow
Hello @coreylowman!
I was running mistral.rs on a WSL machine when I ran into this error:
thread 'main' panicked at /home/mbuehler/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.5/src/driver/sys/mod.rs:61:9: Unable to find cuda lib under the names ["cuda", "nvcuda"]. Please open GitHub issue.
I have CUDA 12.2 installed, but my
LD_LIBRARY_PATH
does not point to a path wherelibcuda.so
can be found. When I manually set it to point there, everything works. This problem has not been reported, so I am wondering if it is only with my machine. Regardless it would be helpful to have an error message indicating to checkLD_LIBRARY_PATH
.
Hello @EricLBuehler, I have experienced the exact same problem. Can you show me how to make it work?
Hi @coreylowman Issue still persists in .012.1, odd is that it always does dynamic cuda linking even when default-features=false
and no "dynamic-linking" in features as error is always Unable to dynamically load the "cuda" shared library - searched for library names: ["cuda", "nvcuda"]...
Therefore, can't disable at all dynamic linking and hopefully make it work with runtime env vars.
Lastly, trying to make it work by the forced dynamic linking, maybe we should include other root strings, as mine WSL is at /usr/local/cuda-12.6/
which is not part at all of here
Oh, symbolic links didn't work either, so I don't know what else to try in WSL.
Is there a way around this issue? My libcuda.so is located at /usr/local/cuda-12.4/lib64/stubs/libcuda.so
there is a symlink at /usr/local/cuda/lib64/stubs/libcuda.so
. I've tried all sorts of variations on the LD_LIBRARY_PATH env and none seem to work. Currently have env variables set like this.
CUDA_HOME="/usr/local/cuda-12.4"
CUDA_MAJOR_VERSION="12"
CUDA_MINOR_VERSION="4"
CUDA_PATH=$CUDA_HOME
CUDA_ROOT=$CUDA_HOME
CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME
LD_LIBRARY_PATH="$CUDA_HOME/lib64:$CUDA_HOME/lib64/stubs"
PATH="$PATH:$CUDA_HOME/bin"
Is there a way around this issue? My libcuda.so is located at
/usr/local/cuda-12.4/lib64/stubs/libcuda.so
there is a symlink at/usr/local/cuda/lib64/stubs/libcuda.so
. I've tried all sorts of variations on the LD_LIBRARY_PATH env and none seem to work. Currently have env variables set like this.CUDA_HOME="/usr/local/cuda-12.4" CUDA_MAJOR_VERSION="12" CUDA_MINOR_VERSION="4" CUDA_PATH=$CUDA_HOME CUDA_ROOT=$CUDA_HOME CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME LD_LIBRARY_PATH="$CUDA_HOME/lib64:$CUDA_HOME/lib64/stubs" PATH="$PATH:$CUDA_HOME/bin"
I see two possible solutions, one less diligent than the other: 1) Make a pull request to the team, explaining and fixing the issue (TBH I should have done this already, but haven't...); 2) Clone the repo yourself, edit source code and use that instead.
Is there a way around this issue? My libcuda.so is located at
/usr/local/cuda-12.4/lib64/stubs/libcuda.so
there is a symlink at/usr/local/cuda/lib64/stubs/libcuda.so
. I've tried all sorts of variations on the LD_LIBRARY_PATH env and none seem to work. Currently have env variables set like this.CUDA_HOME="/usr/local/cuda-12.4" CUDA_MAJOR_VERSION="12" CUDA_MINOR_VERSION="4" CUDA_PATH=$CUDA_HOME CUDA_ROOT=$CUDA_HOME CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME LD_LIBRARY_PATH="$CUDA_HOME/lib64:$CUDA_HOME/lib64/stubs" PATH="$PATH:$CUDA_HOME/bin"
I see two possible solutions, one less diligent than the other: 1) Make a pull request to the team, explaining and fixing the issue (TBH I should have done this already, but haven't...); 2) Clone the repo yourself, edit source code and use that instead.
@ChaseLewis I went ahead and did the 2nd option, still no luck. I found out that it should work as the cuda toolkit installation does a symlink to /usr/local/cuda/ from whatever latest cuda installation you do. I did s fresh cleanup and ground up new installation and now I get Error: DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")
clearly mistaken. Using the same env vars, my simple (separate) cuda code works well i.e. meaning the installation and hardware are good, but not my simple cudarc code (as I am trying to isolate and troubleshoot cudarc in WSL2 from candle).
At least the Unable to dynamically load (...)
error stopped. However, considering the new error, I don't know if I'm better or worse off...
I hope to continue troubleshooting cudarc in WSL2.
Hello @coreylowman!
I was running mistral.rs on a WSL machine when I ran into this error:
I have CUDA 12.2 installed, but my
LD_LIBRARY_PATH
does not point to a path wherelibcuda.so
can be found. When I manually set it to point there, everything works. This problem has not been reported, so I am wondering if it is only with my machine. Regardless it would be helpful to have an error message indicating to checkLD_LIBRARY_PATH
.