getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 64 forks source link

No GPU detected with the Windows Subsystem for Linux (WSL2): No target to make 'KeOps_formula' #177

Open Jacob-Francis opened 3 years ago

Jacob-Francis commented 3 years ago

Hi,

I'm trying to use geomloss and using no cost and auto backend the library works. However if I try using my own cost written using KeOps formulae it doesn't compile. The error I get is ____ No rule to make target 'KeOps_formula'. Stop_

This is the same error I get if I try the test installation pykeops.test_numpy_bindings() however i'm not sure whats not working.

For reference I'm using WSL2 - Ubuntu on a windows device, which does have a cuda-enabled GPU. And I've attacked the CMakeError log. CMakeError.log

Any help is appreciated, Best,

Jacob

jeanfeydy commented 3 years ago

Hi @Jacob-Francis,

Thanks for your interest in both libraries! Many things can be missing here, depending on your exact configuration: as far as I can tell from your CMakeError, CMake is getting confused by an unknown missing dependency and is throwing a “dummy” error related to pthread (this is a known bug in the CMake error handling system).

To let us diagnose your problem efficiently, could you please:

  1. Check that you have installed all the “system” dependencies that cannot be handled by pip, including a recent version of CMake and of the nvcc compiler. The full list is available here.
  2. Run the tests and show us the output. Normally, these functions should return a “verbose” error message that will let us pinpoint what package/compiler is missing on your machine.

What do you think? Best regards, Jean

Jacob-Francis commented 3 years ago

Hi Jean,

Checking my cmake version, I didn't have the correct version, which is a very annoying mistake. Thank you very much for your help. 'pykeops.test_numpy_bindings()' appears to work now returning:

[pyKeOps]: Warning, cuda was detected, but driver API could not be initialized. Switching to cpu only.
Cleaning /home/jacob/.cache/pykeops-1.5-cpython-38/...
    - /home/jacob/.cache/pykeops-1.5-cpython-38/libKeOpstorcha70a2fa704 has been removed.
    - /home/jacob/.cache/pykeops-1.5-cpython-38/keops_hash.log has been removed.
    - /home/jacob/.cache/pykeops-1.5-cpython-38/libKeOpstorch87b1f89b39 has been removed.
    - /home/jacob/.cache/pykeops-1.5-cpython-38/build-2e35d322fd has been removed.
    - /home/jacob/.cache/pykeops-1.5-cpython-38/build-pybind11_template-libKeOps_template_a680976379 has been removed.
[pyKeOps] Initializing build folder for dtype=float64 and lang=numpy in /home/jacob/.cache/pykeops-1.5-cpython-38 ... done.
[pyKeOps] Compiling libKeOpsnumpy703b5db891 in /home/jacob/.cache/pykeops-1.5-cpython-38:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... 
[pyKeOps] Compiling pybind11 template libKeOps_template_b6ed4ac4c5 in /home/jacob/.cache/pykeops-1.5-cpython-38 ... done.
Done.

pyKeOps with numpy bindings is working!

I still do get the Cuda driver error if you're able to help with that too? nvcc -V returns

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0

So I'm not sure why I'm getting this error, and I've followed instructions from Nvidia about setting up cuda on wsl (https://docs.nvidia.com/cuda/wsl-user-guide/index.html) and my GPU is cuda enabled. Potentially a Paths problem, but my knowledge is limited.

Either way thank you, best,

Jacob

jeanfeydy commented 3 years ago

Hi @Jacob-Francis ,

Happy to see that you could at least get the CPU version working quickly: it is the first time that I hear about the WSL (Windows Subsytem for Linux), which could be a simple way of getting KeOps to work on Windows (@bcharlier, @fradav). As far as the CUDA API is concerned, this may either be a PATH issue or a deeper compatibility problem with the WSL interface. To diagnose the problem, could you edit manually the file keops/pykeops/common/gpu_utils.py to un-comment the lines 31-32 ? You may find the folder in which pykeops is installed with:

import pykeops
print(pykeops.__file__)

Best regards, Jean

fradav commented 3 years ago

The GPU layer is available in WSL2 only on the dev channel of Windows Insisers program. (I'm using it.)

Jacob-Francis commented 3 years ago

Hi @jeanfeydy,

Thank you very much for your fast replies and in getting it working.

I've uncommented those lines and got the following:

[pyKeOps]: cuInit failed with error code 100: no CUDA-capable device is detected [pyKeOps]: Warning, cuda was detected, but driver API could not be initialized. Switching to cpu only.

Though I'm using a Cuda-Enabled GPU (Nvidia Quadtro T1000) and checked on GPU-Z (I've attached a screenshot). When I run torch.cuda.is_available() from my windows terminal I get true, but false in the ubuntu-wsl?

Though I set up the WSL using the Cuda-Nvidia instructions, so I'm on the Devs channel and WSL2 (@fradav ). So I'm not sure why it can't see the GPU.

Thank you again, best,

Jacob

GPU_Z_SS

fradav commented 3 years ago

Perhaps you switched to dev channel at a bad time, today it automatically means windows 11. Nvidia may have to release a new updated driver for this.

Jacob-Francis commented 3 years ago

Hi @fradav,

Are you not on dev too, hence windows 11?

thank you, best,

Jacob

gabrielfougeron commented 3 years ago

Hello everyone,

I too am trying to use pykeops with wsl2. I upgraded to windows 11 developer insider, but pykeops does not seem to find the driver API for cuda. Still, in CPU only, bindings seem to work.

Here are the test results :

Python 3.9.6 (default, Aug 18 2021, 19:38:01) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.cuda.is_available() True import pykeops [pyKeOps]: Warning, cuda was detected, but driver API could not be initialized. Switching to cpu only. torch.cuda.is_available() True pykeops.test_numpy_bindings()

pyKeOps with numpy bindings is working!

pykeops.clean_pykeops() Cleaning /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/...

  • /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/build-45f887f4bb has been removed.
  • /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/keops_hash.log has been removed.
  • /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/libKeOpsnumpy0563cd578f has been removed.
  • /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/build-pybind11_template-libKeOps_template_6ddefd0347 has been removed.
  • /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/build-b69b37a37b has been removed.
  • /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/build-pybind11_template-libKeOps_template_53c9569036 has been removed.
  • /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39/libKeOpstorchc5092baaf0 has been removed. pykeops.test_numpy_bindings() [pyKeOps] Initializing build folder for dtype=float64 and lang=numpy in /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39 ... done. [pyKeOps] Compiling libKeOpsnumpy0563cd578f in /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39: formula: Sum_Reduction(SqNorm2(x - y),1) aliases: x = Vi(0,3); y = Vj(1,3); dtype : float64 ... [pyKeOps] Compiling pybind11 template libKeOps_template_6ddefd0347 in /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39 ... done. Done.

pyKeOps with numpy bindings is working!

pykeops.test_torch_bindings() [pyKeOps] Initializing build folder for dtype=float32 and lang=torch in /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39 ... done. [pyKeOps] Compiling libKeOpstorchc5092baaf0 in /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39: formula: Sum_Reduction(SqNorm2(x - y),1) aliases: x = Vi(0,3); y = Vj(1,3); dtype : float32 ... [pyKeOps] Compiling pybind11 template libKeOps_template_53c9569036 in /home/gabrielfougeron/.cache/pykeops-1.5-cpython-39 ... done. Done.

pyKeOps with torch bindings is working!

However, Pytorch seems to be able to use the GPU. Am I missing something ?

Best,

Gabriel

gabrielfougeron commented 2 years ago

Update : I managed to have keops find the CUDA API by adding the path to libcuda.so to LD_LIBRARY_PATH.

@Jacob-Francis : You might want to try that ...

Jacob-Francis commented 2 years ago

Hi @gabrielfougeron , I tried your fix and it has worked. Thank you!

gabrielfougeron commented 2 years ago

Update : Not quite sure what happened, but out of the blue, this very error popped up again. I don't know what to do. EDIT : The problem litterally disappeared when I rebooted my computer. I think it has something to do with some sneaky pilot update from NVIDIA on Windows.