getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

Issues when importing pykeops nvrtcGetPTXSize #236

Open Xparx opened 2 years ago

Xparx commented 2 years ago

Trying to install pykeops in a conda environment and ran into the following error after updating the c++ library. The problem is I don't know what package to go after here. It looks to me as an nvidia/cuda library issue but might be connected to some build package i'm missing. any pointers on what package i'm missing would be great.

import pykeops

[KeOps] Compiling cuda jit compiler engine ... In file included from /home/user/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:20:
/home/user/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp: In function ‘int Compile(const char*, const char*, int, int, const char*)’:
<command-line>: error: ‘nvrtcGetCUBINSize’ was not declared in this scope; did you mean ‘nvrtcGetPTXSize’?
/home/user/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/include/utils_pe.h:6:26: note: in definition of macro ‘NVRTC_SAFE_CALL’
    6 |     nvrtcResult result = x;                                       \
      |                          ^
/home/user/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:89:21: note: in expansion of macro ‘nvrtcGetTARGETSize’
   89 |     NVRTC_SAFE_CALL(nvrtcGetTARGETSize(prog, &targetSize));
      |                     ^~~~~~~~~~~~~~~~~~
<command-line>: error: ‘nvrtcGetCUBIN’ was not declared in this scope; did you mean ‘nvrtcGetPTX’?
/home/user/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/include/utils_pe.h:6:26: note: in definition of macro ‘NVRTC_SAFE_CALL’
    6 |     nvrtcResult result = x;                                       \
      |                          ^
/home/user/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/binders/nvrtc/nvrtc_jit.cpp:92:21: note: in expansion of macro ‘nvrtcGetTARGET’
   92 |     NVRTC_SAFE_CALL(nvrtcGetTARGET(prog, target));
      |                     ^~~~~~~~~~~~~~

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-68b45488030e> in <module>
----> 1 import pykeops

~/anaconda3/envs/try_keops/lib/python3.9/site-packages/pykeops/__init__.py in <module>
      1 import os
      2 
----> 3 import keopscore
      4 import keopscore.config
      5 import keopscore.config.config

~/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/__init__.py in <module>
     32 
     33     if not os.path.exists(jit_compile_dll()):
---> 34         Gpu_link_compile.compile_jit_compile_dll()

~/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/binders/nvrtc/Gpu_link_compile.py in compile_jit_compile_dll()
    101     def compile_jit_compile_dll():
    102         KeOps_Message("Compiling cuda jit compiler engine ... ", flush=True, end="")
--> 103         KeOps_OS_Run(
    104             Gpu_link_compile.get_compile_command(
    105                 sourcename=jit_compile_src,

~/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/utils/misc_utils.py in KeOps_OS_Run(command)
     39         if out.stderr != b"":
     40             print(out.stderr.decode("utf-8"))
---> 41             KeOps_Error("Error compiling formula.")
     42     elif python_version >= (3, 5):
     43         import subprocess

~/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/utils/misc_utils.py in KeOps_Error(message, show_line_number)
     26         frameinfo = getframeinfo(currentframe().f_back)
     27         message += f" (error at line {frameinfo.lineno} in file {frameinfo.filename})"
---> 28     raise ValueError(message)
     29 
     30 

ValueError: [KeOps] Error : Error compiling formula. (error at line 41 in file /home/user/anaconda3/envs/try_keops/lib/python3.9/site-packages/keopscore/utils/misc_utils.py)
jeanfeydy commented 2 years ago

Hi @Xparx,

I see! My first guess is that this comes from a mismatch between CUDA versions - nvrtcGetCUBINSize was only introduced recently. It should not be required to run KeOps, but it appears that your system used in when defining NVRTC_SAFE_CALL and then didn't have access to it anymore.

Could you please tell us the version of your CUDA installation (and if you have several installations on the same machine), e.g. by running nvidia-smi and/or nvcc --version?

Best regards, Jean

joanglaunes commented 2 years ago

Hello @Xparx and @jeanfeydy To give some more details : KeOps checks the cuda version by launching the cuda runtime dll (cudart). If the version is at least 11.1, it will enable the nvrtcGetCUBIN instead of nvrtcGetPTX method, because nvrtcGetCUBIN was introduced only in cuda version 11.1. This means that the compiled code will be stored as a cubin file instead of a ptx file for later use, which is a bit faster. So in your case, the error might indicate, as Jean was saying, that there is a mismatch between different cuda versions installed on your system. Doing some search over the internet right now, it is possible that we did a mistake and that the "cubin" method was introduced in cuda version 11.3 and not 11.1... In any case it will help if you tell us which cuda version you use and if there might be various versions installed together.

Xparx commented 2 years ago

Version issues might be the case. Sorry for the late reply.

running the cuda tools i get:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

and

$ nvidia-smi 
Mon Apr  4 11:02:44 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+

I have not yet figured out why the cuda version always seem to vary between the tools but there might be a miss-match here that shouldn't be.

sbelharbi commented 1 year ago

hi, i am having similar issue possibly caused by the difference between the install cuda version (loaded), and the pytorch shipped cuda.

$nvidiam-smi
NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1
$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

pytorch was installed via:

pip install torch==1.11.0 -f https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp37-cp37m-linux_x86_64.whl

to work with version 1.11.0 which requires cuda11.3 and it is not installed locally. CUDA_PATH points to the version cuda11.1. there are several other versions: cuda-10.0, cuda-9.2. i'll face the same issue on other servers for the same reason: pytorch requires some cuda version that is not installed. is there a workaround this? i can asked to install the required cuda version as last solution.

i just checked:

python -c "import torch; print(torch.version.cuda)"
>>> 10.2

i am not really sure why it is 10.2. i expected 11.3, ie., the version it is shipped with... -> i uninstalled pytorch and reinstallled it with the same above command. now torch.version.cuda points to the right version:

python -c "import torch; print(torch.version.cuda)"
>>> 11.3

i reinstalled keops, but i am still having the same issue, but now i am not sure it is similar to the issue in this post. here is the full log mentioning missing pybind11.h. i am looking to why this is missing....

import pykeops
[pyKeOps] Compiling nvrtc binder for python ... /usr/bin/python3: No module named pybind11
pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/pykeops/common/keops_io/pykeops_nvrtc.cpp:5:10: fatal error: pybind11/pybind11.h: No such file or directory
 #include <pybind11/pybind11.h>
          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Traceback (most recent call last):
  File "pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-68b45488030e>", line 1, in <module>
    import pykeops
  File "pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/pykeops/__init__.py", line 43, in <module>
    compile_jit_binary()
  File "pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 82, in compile_jit_binary
    KeOps_OS_Run(compile_command)
  File "pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/keopscore/utils/misc_utils.py", line 41, in KeOps_OS_Run
    KeOps_Error("Error compiling formula.")
  File "pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/keopscore/utils/misc_utils.py", line 28, in KeOps_Error
    raise ValueError(message)
ValueError: [KeOps] Error : Error compiling formula. (error at line 41 in file pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/keopscore/utils/misc_utils.py)

the package pybind11 seems to be already installed.

$ pip install pybind11
Requirement already satisfied: pybind11 in pathx/anaconda3/envs/yenv/lib/python3.7/site-packages (2.10.0)

why it is looking for pybind11 outside the virtual env: /usr/bin/python3: No module named pybind11? it is already installed in the virtual env. but not in the root env.

it seems that pip install pybind11 does not install the headers or it installed them by pybind11 failed to find them. looking for a solution.

header is already installed locally in the virtual env.:

$ ls  pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/pybind11/include/pybind11/
attr.h
buffer_info.h
cast.h
chrono.h
common.h
complex.h
detail
eigen.h
embed.h
eval.h
functional.h
gil.h
iostream.h
numpy.h
operators.h
options.h
pybind11.h
pytypes.h
stl
stl_bind.h
stl.h

pybind11 was installed using pip install pybind11. the header pybind11.h does not exist in pathx/anaconda3/envs/yenv/include/. but

import pybind11
pybind11.get_include()

returns the right path for the headers: pathx/anaconda3/envs/yenv/lib/python3.7/site-packages/pybind11/include.

can you tell what this is doing with a python outside the virtual env. i suspect this is what is causing the issue because pybind11 and its headers are installed in the virtual env. but not in root which may explain the failure:

[pyKeOps] Compiling nvrtc binder for python ... /usr/bin/python3: No module named pybind11

within the virtual env. python points to the right one (local) not system one:

$ which python
pathx/anaconda3/envs/yenv/bin/python

thanks

sbelharbi commented 1 year ago

@jeanfeydy @joanglaunes can you please see why keops, that is installed in a virtual env, is looking into system's python:

[pyKeOps] Compiling nvrtc binder for python ... /usr/bin/python3: No module named pybind11

this may be the cause of the above issue.

thanks

sbelharbi commented 1 year ago

In [2]:



thanks
kheyer commented 1 year ago

Adding to this in case it's helpful to someone I ran into this installing keops on a new environment. I got this error with version 2.1. I reinstalled version 1.5 (which I was using on another environment) and that version worked fine.