getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

[KeOps] Error when compiling formula (error in nvrtcCompileProgram) #288

Open cristianacarpinteiro opened 1 year ago

cristianacarpinteiro commented 1 year ago

I'm trying to install pykeops on a system with the following requirements:

I've tried to install it in several different ways, with pip, directly from source or using the Dockerfile, but I get always errors when performing the python.test_numpy_bindings() tests. These errors cause the code to crash with some error like " [KeOps] Error when compiling formula (error in nvrtcCompileProgram) Aborted (Core Dumped)". I think that this may be related with my CUDA drivers. Any idea what the solution may be? Thanks in advance!!

jeanfeydy commented 1 year ago

Hi @cristianacarpinteiro ,

Thanks for your interest in our library!

To identify your issue (that was raised by this line), I would appreciate some more information on your error stack. Could you maybe let us know about the output of nvidia-smi and the error raised by:

import pykeops
pykeops.test_numpy_bindings()

Please note that if your are running these commands from a (Jupyter) notebook, the error messages from the compiler may be redirected to the output of the program that ran jupyter notebook instead of the error of your notebook cell. In this context, it would be best if you could directly run these commands in a "raw" Python shell.

Best regards, Jean

cristianacarpinteiro commented 1 year ago

Hello @jeanfeydy, I am using the shell. The output to nvidia-smi:

image

The output of the error:

import pykeops
[KeOps] Compiling cuda jit compiler engine ... OK
[pyKeOps] Compiling nvrtc binder for python ... OK
pykeops.test_numpy_bindings()
[KeOps] Generating code for formula Sum_Reduction((Var(0,3,0)-Var(1,3,1))|(Var(0,3,0)-Var(1,3,1)),1) ... terminate called after throwing an instance of 'std::runtime_error'
  what():  [KeOps] Error when compiling formula (error in nvrtcCompileProgram).
Aborted (core dumped)
jeanfeydy commented 1 year ago

Hi @cristianacarpinteiro ,

Thanks for your quick update. This is the first time that I see this error (I was expecting your test to fail at the [KeOps] Compiling cuda jit compiler engine step), and also the first time that I see KeOps being used with CUDA 12.0. I mostly work with containers and since CUDA 12.0 was released just over a month ago, I hadn't even noticed it just yet.

I will try to create a "bleeding edge Docker file" with CUDA 12.0, PyTorch 2.0 and KeOps to see if we need to update something. I remember that CUDA 11.0 also broke a few things (that were fixed with later 11.x iterations), so that wouldn't be a big surprise. In the mean time, you may be interested by our reference Docker/Singularity image to start playing around with the library.

Best regards, Jean

cristianacarpinteiro commented 1 year ago

Hi, I'm already using the docker image provided! This happens inside the container. I tested it with a different version of cuda it also broke. Thank you, Cristiana

SimonCoste commented 1 year ago

Hi, for the record I encounter the same problem. I am on Cuda 10.1 (see pic therein). cuda_version

I am not using docker.

The output of pykeops.test_numpy_bindings() is right there: keops

zhangweibin970807 commented 8 months ago

same error

joanglaunes commented 8 months ago

Hello @zhangweibin970807 , Can you tell us of the error is still present if you use the latest version of KeOps on the main git branch, instead of the release version ? We have been fixing some stuff recently, so it is possible that it has solved the problem. If not, could you tell us what is your configuration (python version, cuda version, os, GPU card) ?

soulofxin commented 2 months ago
pykeops.test_numpy_bindings()

Have you solved that problem