getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

Installation failure #257

Closed parthe closed 2 years ago

parthe commented 2 years ago

Error message when I enter import pykeops in ipython

<stdin>:1:10: fatal error: cuda.h: No such file or directory
compilation terminated.

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 import pykeops

File CONDA_ENV/lib/python3.8/site-packages/pykeops/__init__.py:3, in <module>
      1 import os
----> 3 import keopscore
      4 import keopscore.config
      5 import keopscore.config.config

File CONDA_ENV/lib/python3.8/site-packages/keopscore/__init__.py:14, in <module>
     11 with open(os.path.join(here, "keops_version"), encoding="utf-8") as v:
     12     __version__ = v.read().rstrip()
---> 14 from .config.config import set_build_folder, get_build_folder
     15 from .utils.code_gen_utils import clean_keops
     17 # flags for debugging :
     18 # prints information about atomic operations during code building

File CONDA_ENV/lib/python3.8/site-packages/keopscore/config/config.py:207, in <module>
    202 nvrtc_flags = (
    203     compile_options
    204     + f" -fpermissive -L{libcuda_folder} -L{libnvrtc_folder} -lcuda -lnvrtc"
    205 )
    206 nvrtc_include = " -I" + bindings_source_dir
--> 207 cuda_include_path = get_cuda_include_path()
    208 if cuda_include_path:
    209     nvrtc_include += " -I" + cuda_include_path

File CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/gpu_utils.py:71, in get_cuda_include_path()
     67                 return includepath
     69 # last try, testing if by any chance the header is already in the default
     70 # include path of gcc
---> 71 path_cudah = get_include_file_abspath("cuda.h")
     72 if path_cudah:
     73     path = os.path.dirname(path_cudah)

File CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/gpu_utils.py:92, in get_include_file_abspath(filename)
     90 def get_include_file_abspath(filename):
     91     tmp_file = join(get_build_folder(), "tmp.txt")
---> 92     KeOps_OS_Run(
     93         f'echo "#include <{filename}>" | {cxx_compiler} -M -E -x c++ - | head -n 2 > {tmp_file}'
     94     )
     95     strings = open(tmp_file).read().split()
     96     abspath = None

File CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/misc_utils.py:41, in KeOps_OS_Run(command)
     39     if out.stderr != b"":
     40         print(out.stderr.decode("utf-8"))
---> 41         KeOps_Error("Error compiling formula.")
     42 elif python_version >= (3, 5):
     43     import subprocess

File CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/misc_utils.py:28, in KeOps_Error(message, show_line_number)
     26     frameinfo = getframeinfo(currentframe().f_back)
     27     message += f" (error at line {frameinfo.lineno} in file {frameinfo.filename})"
---> 28 raise ValueError(message)

ValueError: [KeOps] Error : Error compiling formula. (error at line 41 in file CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/misc_utils.py)

g++=8.5 Python=3.8 torch=1.11.1 Cuda=11.6

parthe commented 2 years ago

Update: This error does not occur when running on machine without a GPU.

mf-ananas commented 2 years ago

Hello, I am not sure if the solution that worked for me is the same for your case.

What worked for me on the error message "fatal error: cuda.h: No such file or directory" was to first check where the "cuda.h" file is (with the command "locate" as explained here: https://github.com/getkeops/keops/issues/3#issuecomment-493751828), then exporting the path as an environment variable as explained here: https://github.com/getkeops/keops/issues/213#issuecomment-1048066974 and calling the import again to test if the error was fixed like this: export CUDA_PATH= python -c 'import pykeops; pykeops.test_torch_bindings()'. If this fixed your problem you can let conda call the export when activating the environment with these instructions: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux.

Btw the outputs of locating cuda.h were something around:

...
/home/mari/anaconda3/include/cuda.h
...
jeanfeydy commented 2 years ago

Hi @parthe ,

Thanks again for your interest in this library!

@mferrandon has summarized the issue well: it seems that your CUDA files are not located in a standard folder (which is very common on shared institutional clusters), or that your installation of CUDA does not contain the required "headers" (= development files). Note that #3 is an issue that dates back to KeOps 1.x, so the workaround should be much simpler now (KeOps does not require CMake anymore...).

Once you have installed a more complete version of CUDA and/or located where the "cuda.h" and "nvrtc.h" files are located on your system, you may explicitly give this information to KeOps via the CUDA_PATH environment variable.

Alternatively, you may prefer to work in our official Docker/Singularity container as explained in the documentation. You can have a look at our Dockerfile to see how we generally setup everything that is required by KeOps:

/opt/conda/bin/conda install -y -c nvidia/label/cuda-11.3.1 cuda

allows us to download a complete CUDA environment,

export CUDA_PATH=/opt/conda/

allows us to make sure that KeOps detects CUDA which is installed in /opt/conda, etc.

What do you think? Best regards, Jean

parthe commented 2 years ago

Hi @mferrandon and @jeanfeydy for your prompt response. Indeed I am on an institutional machine. I believe I was able to load cuda correctly. It needed a cluster-specific command. However I now face a new error message when I run the following in a terminal

python -c "import pykeops; pykeops.test_numpy_bindings()

[KeOps] Compiling cuda jit compiler engine ... In file included from /home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:21:
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp: In function ‘int Compile(const char*, const char*, int, int, const char*)’:
<command-line>: error: ‘nvrtcGetCUBINSize’ was not declared in this scope
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/include/utils_pe.h:6:26: note: in definition of macro ‘NVRTC_SAFE_CALL’
     nvrtcResult result = x;                                       \
                          ^
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:90:21: note: in expansion of macro ‘nvrtcGetTARGETSize’
     NVRTC_SAFE_CALL(nvrtcGetTARGETSize(prog, &targetSize));
                     ^~~~~~~~~~~~~~~~~~
<command-line>: note: suggested alternative: ‘nvrtcGetPTXSize’
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/include/utils_pe.h:6:26: note: in definition of macro ‘NVRTC_SAFE_CALL’
     nvrtcResult result = x;                                       \
                          ^
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:90:21: note: in expansion of macro ‘nvrtcGetTARGETSize’
     NVRTC_SAFE_CALL(nvrtcGetTARGETSize(prog, &targetSize));
                     ^~~~~~~~~~~~~~~~~~
<command-line>: error: ‘nvrtcGetCUBIN’ was not declared in this scope
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/include/utils_pe.h:6:26: note: in definition of macro ‘NVRTC_SAFE_CALL’
     nvrtcResult result = x;                                       \
                          ^
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:93:21: note: in expansion of macro ‘nvrtcGetTARGET’
     NVRTC_SAFE_CALL(nvrtcGetTARGET(prog, target));
                     ^~~~~~~~~~~~~~
<command-line>: note: suggested alternative: ‘nvrtcGetPTX’
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/include/utils_pe.h:6:26: note: in definition of macro ‘NVRTC_SAFE_CALL’
     nvrtcResult result = x;                                       \
                          ^
/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:93:21: note: in expansion of macro ‘nvrtcGetTARGET’
     NVRTC_SAFE_CALL(nvrtcGetTARGET(prog, target));
                     ^~~~~~~~~~~~~~

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/pykeops/__init__.py", line 3, in <module>
    import keopscore
  File "/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/__init__.py", line 34, in <module>
    Gpu_link_compile.compile_jit_compile_dll()
  File "/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/binders/nvrtc/Gpu_link_compile.py", line 103, in compile_jit_compile_dll
    KeOps_OS_Run(
  File "/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/misc_utils.py", line 41, in KeOps_OS_Run
    KeOps_Error("Error compiling formula.")
  File "/home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/misc_utils.py", line 28, in KeOps_Error
    raise ValueError(message)
ValueError: [KeOps] Error : Error compiling formula. (error at line 41 in file /home/USER/.conda/envs/CONDA_ENV/lib/python3.8/site-packages/keopscore/utils/misc_utils.py)
jeanfeydy commented 2 years ago

Hi @parthe,

I see: it is very likely a mis-match between different versions of CUDA, as in #236. Once again, this is fairly common on shared clusters as modules tend to be loaded silently - and we end up with two concurrent versions of the CUDA development files in the PATH.

Could you please tell us the version of your CUDA installation (and if you have several installations on the same machine), e.g. by running nvidia-smi and/or nvcc --version?

Best regards, Jean

parthe commented 2 years ago

Here are my outputs

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
jeanfeydy commented 2 years ago

I see: it seems that your cluster-specific command loaded the CUDA 11.0 development stack instead of CUDA 11.6 (which is recognized by your driver and nvidia-smi). On the shared machines that I use, the admins provide access to several versions via scripts or modules such as "cuda-11-0", "cuda-11-6", etc. Do you think that you could load CUDA 11.6 (or at least e.g. CUDA 11.3) in your environment?

Best regards, Jean

parthe commented 2 years ago

Turns out this was the issue. Thanks a lot @jeanfeydy

I installed cuda in my conda environment using conda install -c nvidia/label/cuda-11.6.0 cuda-toolkit

Instructions by @mferrandon fixed it. Thanks a lot!