Open simonbyrne opened 3 years ago
libcudadevrt
always resides in that directory, but there should be a lib64 -> targets/x86_64-linux/lib/
link. Together with the libcuda.so
issue, I have a feeling your CUDA distribution is a little messed up.
That said, I'm not really opposed to adding some additional code here: https://github.com/JuliaGPU/CUDA.jl/blob/631e278b56a6355492b4722382c1bec1b323e8af/deps/discovery.jl#L544-L547 (maybe add a comment about the missing link though).
I do think the issue is with how (this cluster's) particular Cuda installation is setup. CUDA_HOME
in the module envrionemnt points to the globally installed version of the cuda assets but for the GPU nodes the shared and static libraries are installed under /usr/lib64
(for the versioned so) and /usr/local/cuda-11.2/ (for the static library and other supporting libraries).
julia> print(ENV["CUDA_HOME"])
/central/software/CUDA/11.2
shell> /usr/lib64/libcuda
libcuda.so.1 libcuda_wrapper.so libcuda.so.460.32.03 libcuda.so
libcuda_wrapper.la libcuda_wrapper.so.0 libcuda_wrapper.so.0.0.0
shell> /usr/local/cuda-11.2/lib64/libcuda
libcudart_static.a libcudart.so libcudart.so.11.2.72 libcudart.so.11.0 libcudadevrt.a
Not sure the best way to resolve this particular setup (maybe just selectively re-direct CUDA_HOME on GPU nodes?) or if we could add another JULIA_CUDA_
env variable to inject other search paths in toolkit_dirs
for weird cluster setups with login node / cpu / gpu node differences.
An alternative thought is to remove the local CUDA detection altogether, fully bet on artifacts, and have cluster users provide an Overrides.toml
which should give you the necessary flexibility (although at a usability cost). But that requires some additional work on the artifact side (probably including the CUDA version in the triple), so a temporary hack with env vars is OK for now.
Describe the bug
On our cluster the CUDA installation (both 10.2 and 11.2) places
libcudadevrt.a
undertargets/x86_64-linux/lib/libcudadevrt.a
.find_libcudadevrt
doesn't search this directory.Manually editing
deps/discovery.jl
lets this work and all other libraries are found correctly.Can we either add that directory to the search path, or add an environment variable that lets us specify the path manually?
cc: @jakebolewski