getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 65 forks source link

KeOps error: ld cannot find -lnvrtc #318

Open kayween opened 1 year ago

kayween commented 1 year ago

Hi,

I am trying to install the latest keops but got an error. Specifically, pykeops cannot find nvrtc in compilation. Any advice resolving this issue?

Ubuntu 20.04.5 LTS Cuda 11.6 Pykeops 2.1

>>> import pykeops
[KeOps] Compiling cuda jit compiler engine ... /usr/bin/ld: cannot find -lnvrtc
collect2: error: ld returned 1 exit status

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kaiwen/anaconda3/envs/altproj/lib/python3.9/site-packages/pykeops/__init__.py", line 3, in <module>
    import keopscore
  File "/home/kaiwen/anaconda3/envs/altproj/lib/python3.9/site-packages/keopscore/__init__.py", line 34, in <module>
    Gpu_link_compile.compile_jit_compile_dll()
  File "/home/kaiwen/anaconda3/envs/altproj/lib/python3.9/site-packages/keopscore/binders/nvrtc/Gpu_link_compile.py", line 103, in compile_jit_compile_dll
    KeOps_OS_Run(
  File "/home/kaiwen/anaconda3/envs/altproj/lib/python3.9/site-packages/keopscore/utils/misc_utils.py", line 41, in KeOps_OS_Run
    KeOps_Error("Error compiling formula.")
  File "/home/kaiwen/anaconda3/envs/altproj/lib/python3.9/site-packages/keopscore/utils/misc_utils.py", line 28, in KeOps_Error
    raise ValueError(message)
ValueError: [KeOps] Error : Error compiling formula. (error at line 41 in file /home/kaiwen/anaconda3/envs/altproj/lib/python3.9/site-packages/keopscore/utils/misc_utils.py)
jeanfeydy commented 1 year ago

Hi @kayween ,

Thanks for your interest in this library! As detailed in this error message, it is likely that setting the environment variable CUDA_PATH to the location of the folder that contains your CUDA installation will solve your issue. Indeed, KeOps will look for the CUDA header files at locations $CUDA_PATH/include/cuda.h and $CUDA_PATH/include/nvrtc.h.

For reference, our main Docker image is based on Ubuntu and documents how to install CUDA from the official nvidia channel, etc. You may find it helpful.

What do you think?

P.S.: I do not know why the full error description appear on some configurations and not on others (such as yours). This is certainly something that we should fix.

kayween commented 1 year ago

Hi @jeanfeydy ,

My cuda is installed in /usr/local/cuda :

~$ ls /usr/local/cuda/include/cuda.h 
/usr/local/cuda/include/cuda.h
~$ ls /usr/local/cuda/include/nvrtc.h 
/usr/local/cuda/include/nvrtc.h

The cuda path is configured correctly as '/usr/local/cuda', but keops still has trouble finding NVRTC.

Python 3.9.17 (main, Jul  5 2023, 20:41:20) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['CUDA_PATH']
'/usr/local/cuda'
>>> import pykeops
[KeOps] Compiling cuda jit compiler engine ... /usr/bin/ld: cannot find -lnvrtc
collect2: error: ld returned 1 exit status
jeanfeydy commented 1 year ago

Hi @kayween ,

I see. Since the linker (ld) seems to be the issue, could you try to also add your CUDA folder to the LD_LIBRARY_PATH environment variable? Assuming that

ls /usr/local/cuda/lib | grep nvrtc

returns a non-empty output that contains something like libnvrtc.so, the following command should work:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib
kayween commented 1 year ago

Hi @jeanfeydy ,

Yeah I have tried that.

The libnvrtc.so file is in the folder /usr/local/cuda/lib64

$ ls /usr/local/cuda/lib64/ | grep nvrtc
libnvrtc-builtins.so
libnvrtc-builtins.so.11.6
libnvrtc-builtins.so.11.6.112
libnvrtc-builtins_static.a
libnvrtc.so
libnvrtc.so.11.2
libnvrtc.so.11.6.112
libnvrtc_static.a

The path has already been added

$ echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda:

But the linker still cannot find the .so file.

The weird thing is that I can compile and run the example on the NVRTC's documentation. So I guess the CUDA path has been configured correctly, but somehow keops cannot find the correct CUDA path.

kayween commented 1 year ago

Hi @jeanfeydy ,

The issue seems to be the python version.

I realized that the dockerfile uses python 3.8, which seems to be crucial.

I have been using python 3.9. Downgrading to python 3.8 resolves the issue.

abhinavgupta0110 commented 1 year ago

Hi @kayween

In the conda env, if just install cudatoolkit, then it does not provide you the cuda.h, nvrtc.h, etc. So one option is to additionally install cudatoolkit-dev from the conda-forge channel, and followed by export CUDA_PATH=/home/kaiwen/anaconda3/envs/altproj in your conda env.

After installing cudatoolkit-dev, you will notice that required .h files are present under /home/kaiwen/anaconda3/envs/altproj/include

Hope this helps!

ZacharyVarley commented 1 year ago

To add for anyone still having this issue. I installed cuda-toolkit from https://anaconda.org/nvidia/cuda-toolkit so I knew that "cuda.h" and "nvrtc.h" were present in my mamba environment's include folder at "/home/zach/mambaforge/envs/myproject/include" but using

mamba env config vars set CUDA_PATH="/home/zach/mambaforge/envs/myproject/"

Did not set the path correctly (unknown reason). I had to invoke it using CONDA:

conda env config vars set CUDA_PATH="/home/zach/mambaforge/envs/myproject/"

I hope this helps anyone that runs into this minor issue. Make sure you reactivate the env. As a side note, the reason for calling conda / mamba for the environment variable is to keep all of my changes (even setting environment variables) isolated from the OS.

muly20 commented 3 months ago

I had the same issue (“cannot find -lcuda, -lnvrtc”). Adding path variables didn’t help, because it turns out I have two cuda installation directories: “cuda” and “cuda-12.1”. The shared object files were called “libcuda.so.1” (under /lib/x86_64-linux-gnu) and “libnvrtc.so.12” (under python-venv/…/nvidia/cuda_nvrtc/lib) instead of just “.so” suffix.

I added a symbolic links with the simple suffix at the corresponding directories and it worked well.

I found the actual paths by tracing the build command generated by the pykeops import.

nataxcan commented 2 weeks ago

I went down a terrible rabbit hole because of this, but I think I found a generally-applicable solution, thanks in part to this loosely-related stackoverflow post

First you have to find out what happens when ld tries to use -lnvrtc, so you run: ld -lnvrtc --verbose which gave me:

attempt to open /usr/local/lib/x86_64-linux-gnu/libnvrtc.so failed
attempt to open /usr/local/lib/x86_64-linux-gnu/libnvrtc.a failed
...
attempt to open /usr/lib/libnvrtc.so failed
attempt to open /usr/lib/nvrtc.a failed

Which is a list of places ld looked for libnvrtc. Now, you can use the where command or whatever command you want to find where libnvrtc.so happens to be on your computer, and then you can do a symbolic link with the location you found the library at, and one location where ld is looking for it:

sudo ln -s /usr/lib/libnvrtc.so /usr/lib/libnvrtc.so

After that, ld -lnvrtc --verbose gives the same list of places it couldn't find libnvrtc but then in the middle I find two lines that say:

found libc.so.6 at /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 needed by /lib/libnvrtc.so

I just wonder if there's any way ld's -lnvrtc command could also look at the folder I needed, as perhaps that's a broader issue for people using wsl2.