Closed LumenYoung closed 8 months ago
Hi,
Did you follow the installation instructions for installing L4CasADi with CUDA as described in [1]?
Crucially, the command CUDACXX=<PATH_TO_NVCC> pip install l4casadi --no-build-isolation
.
Best Tim
[1] https://github.com/Tim-Salzmann/l4casadi?tab=readme-ov-file#gpu-cuda
Hi, Thanks for the prompt reply.
Yes, I tries this out (CUDACXX=/home/yang/micromamba/envs/jepa/bin/nvcc pip3 install . --no-build-isolation
), and several other variables like:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/yang/micromamba/envs/jepa/lib
export PATH=$PATH:/home/yang/micromamba/envs/jepa/bin
export CUDA_HOME=$CUDA_HOME:/home/yang/micromamba/envs/jepa/lib
export CUDNN_INCLUDE_DIR=/home/yang/micromamba/envs/jepa/include
export CUDNN_LIB_DIR=/home/yang/micromamba/envs/jepa/lib
export CUDNN_PATH=/home/yang/micromamba/envs/jepa/bin
export CUDNN_LIBRARY=/home/yang/micromamba/envs/jepa/lib
None of them helped me out, I was also trying to modify the CMakeLists.txt but I don't think I made it correctly. I wonder if you can provide some hint on which direction to tryout?
Best, Jiaye.
Both install from source with
CUDACXX=/home/yang/micromamba/envs/jepa/bin/nvcc pip3 install . --no-build-isolation
and CUDACXX=/home/yang/micromamba/envs/jepa/bin/nvcc pip3 install l4casadi --no-build-isolation
doesn't work in my setup, but the nvcc is ensured to be there and is functioning ( using nvcc -V
), it matches the driver version. I've been fighting with this dependency issue the whole afternoon and I get no clue on what to try next to solve it.
Hi,
the path you referring to /home/yang/micromamba/envs/jepa/bin
does not sound like a normal install path for CUDA (rather like a python env). Normally CUDA is somewhere in /usr/local/cuda-12.1/...
. Can you please post the output of
ls /home/yang/micromamba/envs/jepa/bin
Edit
Crucially /home/yang/micromamba/envs/jepa/bin/nvcc -V
should work.
It is a large output so I paste it here at the pastebin: https://bin.lumeny.io/p/bat-otter-jaguar
If you want to check if nvcc is presented:
❯ ls /home/yang/micromamba/envs/jepa/bin | grep nvcc
nvcc
__nvcc_device_query
nvcc.profile
Yes, this is not a normal cuda install since I don't have admin privilege on server. And the nvcc is actually working with nvcc -V
❯ which nvcc
/home/yang/micromamba/envs/jepa/bin/nvcc
❯ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
The version is selected to match the cuda version installed on the system as shown below:
❯ which nvidia-smi
/usr/bin/nvidia-smi
❯ nvidia-smi
Mon Mar 11 18:08:27 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
Can you elaborate how you installed nvcc and coda-toolkit there. Further are you sure that the respective cuda libraries and headers (which would normally be under /usr/local/cuda-12.1/include
and /usr/local/cuda-12.1/lib64
) are installed "correctly"?
Edit
Another thought: Given that CUDA is already installed on your machine nvcc and related cuda toolkit libraries and headers (might/should - not sure here) be already installed in some root directory too. locate nvcc
or similar could work.
Yes, I installed it with micromamba command, which you can think of as just conda command: micromamba install -c "nvidia/label/cuda-12.2" cuda-toolkit="12.2"
. I do struggle to find the corresponding file in the /usr/local/cuda-12.1/lib64. I do find many of the so file from cuda exists in /home/yang/micromamba/envs/jepa/lib. But not sure if it is a 100 % replicate, the file list can be found here, briefly it contains following cuda related libs.
❯ ls | grep cuda
libcudadevrt.a
libcudart.so
libcudart.so.12
libcudart.so.12.2.140
libcudart_static.a
libicudata.so
libicudata.so.73
libicudata.so.73.2
The /usr/local/cuda-12.1/include I can only find something similar at /home/yang/micromamba/envs/jepa/include/cuda. file inside can be found here.
I will try to find the time tomorrow to replicate your setup. In the meantime - if GPU support is not essential for you to get started, you could just install the PyTorch CPU version in your local env and install L4CasADi without CUDA support, too.
Hi,
I was able to reproduce your setup and error. I made some small changes to the build process. Please clone the latest code version to a fresh folder (or make sure you remove the temp _skbuild
folder). You should then be able to install with CUDA support simply via pip install . --no-build-isolation
.
Best Tim
Hi, Tim, really nice of you to pay attention to custom cuda install.
Meanwhile, I get my supervisor to install a system-wide cuda. But that doesn't straightly give me a successful build until I insert the set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc")
to the CMakeLists.txt
at libcasadi folder. Therefore I would report this here and maybe someone else would try my way out.
Thanks a lot for the help!
Hi, Tim,
Thanks for this great work. I was about to try this out, I managed to install most of them but fail when I try to install the library itself with
pip install . --no-build-isolation
The error message is as follow. It is also very confusing as the error message suggest no cuda found but it also tells me the version of the cuda-toolkit (12.1). I checked the cuda-toolkit, which was installed via micromamba (conda substitution), I also tried out all sorts of possible way to set custom cuda location, but none of them can help me get it complied.
Therefore, I would like to ask if you know how should I get the repo built with custom cuda installation location? Which specific file from the cuda lib does the compilation really need? Thanks!