Cuda error: cuModuleLoadDataEx(&module, target, 0, NULL, NULL) failed with error CUDA_ERROR_INVALID_IMAGE

ZkzMMDC commented 7 months ago

python 3.8 ,cuda 11.2, GPU RTX4090

When I run the following test “pykeops.test_torch_bindings() ” to make sure Keops work:

[KeOps] Generating code for Sum_Reduction reduction (with parameters 1) of formula Sum((a-b)**2) with a=Var(0,3,0), b=Var(1,3,1) ... OK

[KeOps] error: cuModuleLoadDataEx(&module, target, 0, NULL, NULL) failed with error CUDA_ERROR_INVALID_IMAGE

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/pykeops/torch/test_install.py", line 21, in test_torch_bindings
    my_conv(x, y).view(-1), torch.tensor(expected_res).type(torch.float32)
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 687, in __call__
    out = GenredAutograd_fun(params, *args)
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 271, in forward
    outputs = GenredAutograd_base._forward(*inputs)
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 91, in _forward
    myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 91, in __call__
    obj = self.cls(*args)
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 16, in __init__
    super().__init__(*args, fast_init=fast_init)
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 31, in __init__
    self.init_phase2()
  File "/home/zkz/anaconda3/envs/robot/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 29, in init_phase2
    self.launch_keops = pykeops_nvrtc.KeOps_module_float(
RuntimeError: [KeOps] Cuda error.

wang-jh18-SVM commented 6 months ago

I'm encountering the same CUDA_ERROR_INVALID_IMAGE error when running KeOps with pytorch bindings. Below are the steps to reproduce the error, the full error message, and my system information.

Steps to Reproduce:

import pykeops
pykeops.clean_pykeops()
pykeops.test_torch_bindings()

Error Message:

[KeOps] /root/.cache/keops2.2.2/Linux_autodl-container-758f438c9a-33381152_5.15.0-91-generic_p3.8.18 has been cleaned.
[KeOps] Compiling cuda jit compiler engine ... OK
[pyKeOps] Compiling nvrtc binder for python ... OK
[KeOps] Generating code for Sum_Reduction reduction (with parameters 1) of formula Sum((a-b)**2) with a=Var(0,3,0), b=Var(1,3,1) ... OK

[KeOps] error: cuModuleLoadDataEx(&module, target, 0, NULL, NULL) failed with error CUDA_ERROR_INVALID_IMAGE

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/pykeops/torch/test_install.py", line 21, in test_torch_bindings
    my_conv(x, y).view(-1), torch.tensor(expected_res).type(torch.float32)
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 687, in __call__
    out = GenredAutograd_fun(params, *args)
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 271, in forward
    outputs = GenredAutograd_base._forward(*inputs)
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 91, in _forward
    myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 91, in __call__
    obj = self.cls(*args)
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 16, in __init__
    super().__init__(*args, fast_init=fast_init)
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 31, in __init__
    self.init_phase2()
  File "/root/miniconda3/envs/test/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 29, in init_phase2
    self.launch_keops = pykeops_nvrtc.KeOps_module_float(
RuntimeError: [KeOps] Cuda error.

The above error suggests there might be an issue with the CUDA image. I've made sure to clean the KeOps cache before testing the torch bindings.

System Information:

GPU: RTX 4090
CUDA Version: 11.8 (from nvcc -V)
Relevant Installed Packages:
- cudatoolkit: 11.6.2
- pykeops: 2.2.2 (from PyPI)
- pytorch: 1.12.0 with CUDA 11.6 support
- python: 3.8.18

Full Conda List:

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
blas                      1.0                         mkl    conda-forge
brotli-python             1.1.0            py38h17151c0_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
cudatoolkit               11.6.2              hfc3e2af_13    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
freetype                  2.10.4               h0708190_1    conda-forge
gmp                       6.3.0                h59595ed_1    conda-forge
gnutls                    3.6.13               h85f3911_1    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
intel-openmp              2023.1.0         hdb19cb5_46306    defaults
jbig                      2.1               h7f98852_2003    conda-forge
jpeg                      9e                   h0b41bf4_3    conda-forge
keopscore                 2.2.2                    pypi_0    pypi
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.38                 h1181459_1    defaults
lerc                      2.2.1                h9c3ff4c_0    conda-forge
libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge
libdeflate                1.7                  h7f98852_5    conda-forge
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 13.2.0               h807b86a_5    conda-forge
libgfortran-ng            13.2.0               h69a702a_5    conda-forge
libgfortran5              13.2.0               ha4646dd_5    conda-forge
libgomp                   13.2.0               h807b86a_5    conda-forge
libhwloc                  2.9.1                hd6dc26d_0    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_5    conda-forge
libtiff                   4.3.0                hf544144_1    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxml2                   2.10.4               hf1b16e4_1    defaults
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
mkl                       2023.1.0         h213fc3f_46344    defaults
ncurses                   6.4                  h6a678d5_0    defaults
nettle                    3.6                  he412f7d_0    conda-forge
numpy                     1.24.4           py38h59b608b_0    conda-forge
olefile                   0.47               pyhd8ed1ab_0    conda-forge
openh264                  2.1.1                h780b84a_0    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   3.2.1                hd590300_0    conda-forge
pillow                    8.2.0            py38ha0e1e83_1    conda-forge
pip                       23.3.1           py38h06a4308_0    defaults
pybind11                  2.11.1                   pypi_0    pypi
pykeops                   2.2.2                    pypi_0    pypi
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.8.18               h955ad1f_0    defaults
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.12.0          py3.8_cuda11.6_cudnn8.3.2_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
readline                  8.2                  h5eee18b_0    defaults
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
setuptools                68.2.2           py38h06a4308_0    defaults
sqlite                    3.41.2               h5eee18b_0    defaults
tbb                       2021.9.0             hf52228f_0    conda-forge
tk                        8.6.12               h1ccaba5_0    defaults
torchvision               0.13.0               py38_cu116    pytorch
typing_extensions         4.10.0             pyha770c72_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
wheel                     0.41.2           py38h06a4308_0    defaults
xz                        5.4.6                h5eee18b_0    defaults
zlib                      1.2.13               h5eee18b_0    defaults
zstd                      1.5.0                ha95c52a_0    conda-forge

Has anyone else experienced a similar issue or can provide insight into what might be causing the CUDA_ERROR_INVALID_IMAGE error with KeOps on an RTX 4090 GPU?

jeanfeydy commented 6 months ago

Hi @ZkzMMDC , @wang-jh18-SVM ,

Thanks for your interest in our library, and the detailed reports. I don't have a RTX4090 at hand to try this myself, but I'll be very surprised if this turns out to be a hardware issue. KeOps runs fine on all the Nvidia GPUs that we've had access to since 2017, it does not rely on niche instruction sets.

As far as I can tell, the most likely issue here is that in @wang-jh18-SVM 's report, cudatoolkit==11.6.2 while nvcc==1.8: the vast majority of KeOps installation issues happen on systems where several versions of CUDA are available, and we somehow mix up the paths.

Could you maybe:

Install nvcc == 11.6.2 in your conda environment, maybe via e.g. conda install -y -c nvidia/label/cuda-11.6.2 cuda? Our Dockerfile provides a good reference for a fully functional setup.
Try our Docker image (docker pull getkeops/keops-full:latest), just to make sure that this is not a harware problem?

Best regards, Jean

bcharlier commented 6 months ago

I would recommend to use cuda >= 12 with recent hardware (though I am not sure it fixes this particular issue). The cuda version used by keOps could be different from the one used by torch. You may install cuda locally and force the cuda used by keops by setting the env 'CUDA_PATH'.

Yangr116 commented 5 months ago

The same issue, I use the pykeops docker.

soulofxin commented 5 months ago

When I run the following test “pykeops.test_torch_bindings() ” to make sure Keops work:

have you solved that problem?

wang-jh18-SVM commented 5 months ago

Hi, I still don't know why, but when I install pytorch with pip rather than conda, it works.

getkeops / keops

Cuda error: cuModuleLoadDataEx(&module, target, 0, NULL, NULL) failed with error CUDA_ERROR_INVALID_IMAGE #361