getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

KeOps import broken on CPU config? #305

Closed jeanfeydy closed 1 year ago

jeanfeydy commented 1 year ago

Hi @joanglaunes , @bcharlier ,

I hope that you are doing well! Suddenly, my KeOps install and Docker container has stopped working on machines that do not have a GPU.

For instance, on Google Colab, if you make sure to use an instance without GPU acceleration, running:

!pip install pykeops > install.log
import pykeops

Will fail with:

---------------------------------------------------------------------------

OSError                                   Traceback (most recent call last)

[<ipython-input-3-68b45488030e>](https://localhost:8080/#) in <cell line: 1>()
----> 1 import pykeops

[/usr/local/lib/python3.9/dist-packages/pykeops/__init__.py](https://localhost:8080/#) in <module>
      1 import os
      2 
----> 3 import keopscore
      4 import keopscore.config
      5 import keopscore.config.config

[/usr/local/lib/python3.9/dist-packages/keopscore/__init__.py](https://localhost:8080/#) in <module>
     12     __version__ = v.read().rstrip()
     13 
---> 14 from .config.config import set_build_folder, get_build_folder
     15 from .utils.code_gen_utils import clean_keops
     16 

[/usr/local/lib/python3.9/dist-packages/keopscore/config/config.py](https://localhost:8080/#) in <module>
    172 
    173 
--> 174 from keopscore.utils.gpu_utils import get_gpu_props
    175 
    176 cuda_dependencies = ["cuda", "nvrtc"]

[/usr/local/lib/python3.9/dist-packages/keopscore/utils/gpu_utils.py](https://localhost:8080/#) in <module>
     18 CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK = 8
     19 
---> 20 libcuda_folder = os.path.dirname(find_library_abspath("cuda"))
     21 libnvrtc_folder = os.path.dirname(find_library_abspath("nvrtc"))
     22 

[/usr/local/lib/python3.9/dist-packages/keopscore/utils/misc_utils.py](https://localhost:8080/#) in find_library_abspath(lib)
     74         return ""
     75 
---> 76     lib = CDLL(res)
     77     libdl = CDLL(find_library("dl"))
     78 

[/usr/lib/python3.9/ctypes/__init__.py](https://localhost:8080/#) in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    372 
    373         if handle is None:
--> 374             self._handle = _dlopen(self._name, mode)
    375         else:
    376             self._handle = handle

OSError: libcuda.so.1: cannot open shared object file: No such file or directory

I assume that fixing the issue shouldn't be too difficult, putting a try-catch structure in gpu_utils.py.

However, since this part of the code is >12 months old, I don't understand why this error is only showing up today. I'm very confused: do you have any insight?

Best regards, Jean

bcharlier commented 1 year ago

Hi @jeanfeydy ,

my guess is : your configuration has a proper cuda installed... but no gpu. Not sure we have tested this case.

jeanfeydy commented 1 year ago

Hi @bcharlier ,

Thanks for your fast answer! This is indeed what is happening, both on my machine and on Colab. Basically,

from ctypes.util import find_library
find_library("cuda")

works fine and returns "libcuda.so.1", because the CUDA files are present.

But:

from ctypes import CDLL
CDLL(find_library("cuda"))

fails with:

OSError                                   Traceback (most recent call last)

[<ipython-input-2-2af85f5067e4>](https://localhost:8080/#) in <cell line: 2>()
      1 from ctypes import CDLL
----> 2 CDLL(find_library("cuda"))

[/usr/lib/python3.9/ctypes/__init__.py](https://localhost:8080/#) in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    372 
    373         if handle is None:
--> 374             self._handle = _dlopen(self._name, mode)
    375         else:
    376             self._handle = handle

OSError: libcuda.so.1: cannot open shared object file: No such file or directory

I'm very surprised that I have not encountered the problem before... Probably, this is because no one is ever using the KeOps Docker image (that includes a full CUDA environment) on a GPU-less machine.

There are several ways to fix this in the imports, but I don't know which one you prefer. Do you want to fix it yourself, or should I push something?

See you soon, Jean

joanglaunes commented 1 year ago

Hello @jeanfeydy, @bcharlier, It should be ok now ; I have done the correction and merged into main. At least on Colab it is ok. Could you check on your other system ? It seems that the cuda libraries could be detected via the function find_library from types, but then could not be loaded because there were not on the system path. I think I have checked CPU versions of pykeops on Colab for several releases, but maybe not in the past few months... Maybe there was a change in the way Google Colab sets up the paths for the different hardware configurations.

jeanfeydy commented 1 year ago

Thanks a lot @joanglaunes , this works great! See you soon, Jean