FalkonML / falkon

Large-scale, multi-GPU capable, kernel solver
https://falkonml.github.io/falkon/
MIT License
181 stars 22 forks source link

No module named 'falkon.la_helpers.cuda_la_helpers' #26

Closed jmkuebler closed 3 years ago

jmkuebler commented 3 years ago

Error when trying to work with GPU (I do not encounter this on CPU): Maybe something went wrong when refactoring?

File "falkon/la_helpers/cuda_trsm.py", line 7, in from falkon.la_helpers.cuda_la_helpers import cuda_transpose ModuleNotFoundError: No module named 'falkon.la_helpers.cuda_la_helpers'

Do you have any clue how I could circumvent this?

Giodiro commented 3 years ago

Hi @jmkuebler, This is likely a problem with the setup process. Could you try to uninstall (pip uninstall falkon) and install again with python setup.py install?

After that, you should NOT try to import falkon from inside the git directory, since python will pickup the local files before the files you installed (globally or in some virtual environment), and any unit which requires compilation such as cuda_la_helpers will not be present.

Let me know if you have any more troubles.

jmkuebler commented 3 years ago

Thanks @Giodiro for your reply.

I managed to get a little closer. Now when trying to install it I get errors like:

nvcc fatal : Value 'c++14' is not defined for option 'std'

I am a bit confused because you say only c++11 is required. I am afraid that's not a problem with your package anymore. But maybe you are still able to help me out ...

PS: I also encounter the issue #2 but installing in development mode helps.

Giodiro commented 3 years ago

The issue which is fixed when running in development mode is related to the fact that when you install using python setup.py install or pip install falkon, python won't be able to see the compiled files if you run it from the falkon directory (but you can normally solve it by moving to another directory :) ). This is common all python libraries which have some compilation steps. Another way to solve it is install in development mode (python setup.py develop) as you did, or using the --editable option with pip.

The c++14 issue seems (from googling around) to be related to a requirement for PyTorch extensions (see for example https://github.com/pytorch/pytorch/issues/32135 ). It seems to be caused by old versions of the CUDA compiler (nvcc). It's possible that you need to ask your cluster admins to use a newer version of CUDA :( In any case, could you share which CUDA version you have installed (e.g. the output of nvcc --version) and which version of gcc on the cluster? Maybe I can fix this by changing a flag in my setup.py

jmkuebler commented 3 years ago

So right now I am using Cuda V8.0.44 and gcc 4.8.5. The I get

nvcc fatal : Value 'c++14' is not defined for option 'std'

I have also tried with Cuda 10.1 (same gcc) then I get a different error: 129 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!

I guess this goes much beyond your project and not really an issue. I will also try with our admins. But if you have good recommendations I am very happy to take them.

Giodiro commented 3 years ago

So I'm not sure I can help much, but I have a couple of comments:

jmkuebler commented 3 years ago

So with your message and support of our inhouse IT, I managed to get it installed (at least that's my impression right now). I ended up using Cuda 11.0 and gcc 7.

Thank you @Giodiro very much for your help!