getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 64 forks source link

Running pykeops on docker container without access to cuda folder #291

Open HuuDatDo opened 1 year ago

HuuDatDo commented 1 year ago

I'm trying to run pykeops on a docker container connected to a GPUs server that hides the cuda folder. I tried to install the conda environment from #85 as well as the .conda folder but none of them worked. Are there any other ways to run pykeops on this kind of system?

<stdin>:1:10: fatal error: cuda.h: No such file or directory
compilation terminated.
[KeOps] Warning : 
    The location of Cuda header files cuda.h and nvrtc.h could not be detected on your system.
    You must determine their location and then define the environment variable CUDA_PATH,
    either before launching Python or using os.environ before importing keops. For example
    if these files are in /vol/cuda/10.2.89-cudnn7.6.4.38/include you can do :
      import os
      os.environ['CUDA_PATH'] = '/vol/cuda/10.2.89-cudnn7.6.4.38'
      import pykeops

[KeOps] Compiling main dll ... /home/ubuntu/miniconda3/envs/pykeops_env/lib/python3.10/site-packages/keops/binders/nvrtc/keops_nvrtc.cpp:5:10: fatal error: nvrtc.h: No such file or directory
    5 | #include <nvrtc.h>
      |          ^~~~~~~~~
compilation terminated.
OK
Traceback (most recent call last):
  File "/home/ubuntu/22dat.dh/NODE/STSum/test.py", line 5, in <module>
    import pykeops
  File "/home/ubuntu/miniconda3/envs/pykeops_env/lib/python3.10/site-packages/pykeops/__init__.py", line 35, in <module>
    cppyy.include(os.path.join(cuda_include_path, "nvrtc.h"))
  File "/home/ubuntu/miniconda3/envs/pykeops_env/lib/python3.10/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
jeanfeydy commented 1 year ago

Hi @HuuDatDo,

Thanks for your interest in our library!

Could you tell us more about the way CUDA is "hidden" to Docker on your setup? We maintain a reference Docker image: https://hub.docker.com/r/getkeops/keops-full Which is documented here: https://github.com/getkeops/keops/blob/main/Dockerfile

(I will update it soon to catch up with the latest version of PyTorch.)

Install instructions and a typical use case (= rendering the www.kernel-operations.io website) are described here: http://kernel-operations.io/keops/python/installation.html#using-docker-or-singularity.

Best regards, Jean

HuuDatDo commented 1 year ago

Hi @jeanfeydy

Thank you so much for your reply!

According to my lab manager, I'm not allowed to run anything related to cuda because it would affect other users, the cuda folder is not visible to any docker, so in the path 'usr/local/' there is no cuda folder. I still can run nvidia-smi so it should exist. I will ask again if I can create a new docker based on the documentation.

Best, Huu Dat

HuuDatDo commented 1 year ago

Hi @jeanfeydy

Thanks to your instructions, I could build a new container and define the cuda path. However, when I run pykeops.test_numpy_bindings(), I encountered this bug:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    pykeops.test_numpy_bindings()
  File "/opt/conda/envs/stsum/lib/python3.8/site-packages/pykeops/numpy/test_install.py", line 20, in test_numpy_bindings
    if np.allclose(my_conv(x, y).flatten(), expected_res):
  File "/opt/conda/envs/stsum/lib/python3.8/site-packages/pykeops/numpy/generic/generic_red.py", line 303, in __call__
    self.myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
  File "/opt/conda/envs/stsum/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 68, in __call__
    obj = self.cls(*args)
  File "/opt/conda/envs/stsum/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 15, in __init__
    super().__init__(*args, fast_init=fast_init)
  File "/opt/conda/envs/stsum/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 31, in __init__
    self.init_phase2()
  File "/opt/conda/envs/stsum/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 20, in init_phase2
    pykeops_nvrtc = importlib.import_module("pykeops_nvrtc")
  File "/opt/conda/envs/stsum/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /root/.cache/keops2.1/build/pykeops_nvrtc.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr10_M_releaseEv

Do you know any solutions to fix this because some previous similar issues were fixed with pykeops version 1.4.2?

Best, Huu Dat