Cuda compatibility - Githubissues

tguerand commented 3 months ago

I am trying to install liberate-fhe and i am facing quite a few cuda-related issues, following https://docs.desilo.ai/liberate-fhe/getting-started/installation

Everything is fine until the step "Run CUDA compile script"

Is there a specific cuda version that is needed? My machines are either on cuda 10.1 or 11.5 and i am a bit reticent to upgrade the cuda version as it could mess up with some other projects Furthermore, if the cuda compiler version is different than the runtime one (nvcc vs nvidia-smi) it also fails to setup the installation. Is there a fix for this or is it intended? I thought that if the runtime version is more recent than the cuda compiler one it should work (as with torch as an example, I have nvcc version 10.1 but torch uses cu118 which works totally fine)

I tried to:

change the cuda12 version from torch to a cu118 one that finds the gpu on my other projects --> fails because torch version do not match nvcc one
install a version of torch that fits the nvcc --> fails because it fails to locate /usr/local/cuda

Thanks in advance

hanyul-ryu commented 3 months ago

Hello. Thank you for your interest in Liberate.FHE.

The cuda version of our Liberate.FHE is related to the pytorch version. So for this to work, the cuda version of pytorch you installed must match the version of cuda (nvcc) installed on your system.

Currently, our package build system installs the latest version of pytorch, so I think you probably have 2.2.x (or 2.1.x) version of pytorch and cuda 12.1 installed. If you want to check the torch version,

import torch

print(torch.__version__)
# '2.2.1+cu121'

So the easiest way is to change cuda (nvcc) to the cuda version of torch installed.

However, if you are reluctant to change the cuda version due to your other projects, there is another method I suggest.

When you clone our repository, there is pyproject.toml for poetry build, and here is how to change it to the pytorch version you want.

You can change it in the following way. from

[tool.poetry.dependencies]
python = ">=3.10,<3.13"
numpy = "^1.23.5"
mpmath = "^1.3.0"
scipy = "^1.10.1"
matplotlib = "^3.7.1"
joblib = "^1.2.0"
torch = "==2.2.1"
tqdm = "^4.66.1"
ninja = "^1.11.1.1"

to

[tool.poetry.dependencies]
python = ">=3.10,<3.13"
numpy = "^1.23.5"
mpmath = "^1.3.0"
scipy = "^1.10.1"
matplotlib = "^3.7.1"
joblib = "^1.2.0"
torch = [
    {url = "https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp310-cp310-linux_x86_64.whl"
]
tqdm = "^4.66.1"
ninja = "^1.11.1.1"

The link I changed as an example is cuda 11.5 version of pytorch 1.11 version and python 3.10. If there is a specific version of cuda or python you want, just find it in the link and change the address. Things to consider when downloading pytorch manually are the pytorch version, python version, and cuda version.

However, we confirmed that it works with cuda versions 11.7 to 12.1 and pytorch versions 1.13 to 2.2.1.

There is one more way. This method briefly changes the cuda version only in the terminal you run, so it won't cause conflicts with your other projects. At your terminal,

$ export CUDA_HOME=/usr/local/cuda-12.1
$ export PATH=$CUDA_HOME/bin:$PATH
$ export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

And check nvcc version

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

And build our project

We build it manually now, but we have already registered it with pypi to make installation even simpler. Please wait a little longer.

For now, that's all I can tell you. If this doesn't resolve your issue or you have additional questions (about anything related to the Library, not just installation), Please don't hesitate to ask us.

Thank you so much.

tguerand commented 2 months ago

Hi again,

Thanks for your help, I can build it with the snippet:

$ export CUDA_HOME=/usr/local/cuda-12.1
$ export PATH=$CUDA_HOME/bin:$PATH
$ export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

and afterwards poetry install. So all the packages are the same as the initial pyproject.toml file.

I can then python setup.py with warnings:

running build_ext
/home/tristan/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:502: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
/home/tristan/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.1
  warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
running build_ext
/home/tristan/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:502: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
/home/tristan/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.1
  warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')

poetry build and poetry run python -m pip install . have no issues nor warnings.

but when importing liberate

import liberate

I got the following error message

Traceback (most recent call last):
  File "/home/tristan/test.py", line 1, in <module>
    import liberate
  File "/home/tristan/venv/lib/python3.10/site-packages/liberate/__init__.py", line 1, in <module>
    from . import csprng, fhe, utils
  File "/home/tristan/venv/lib/python3.10/site-packages/liberate/csprng/__init__.py", line 1, in <module>
    from .csprng import Csprng
  File "/home/tristan/venv/lib/python3.10/site-packages/liberate/csprng/csprng.py", line 7, in <module>
    from . import (
ImportError: /home/tristan/venv/lib/python3.10/site-packages/liberate/csprng/chacha20_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1017SymbolicShapeMeta18init_is_contiguousEv

which seems to be a cuda related issue.

Do you have a workaround for it ?

Thanks

rayhankinan commented 1 month ago

I have those "undefined symbol" issues as well. Did you find a solution to this issue @tguerand?

Edit: I try to downgrade pytorch into version 2.1.1 and remove all of the "undefined symbol" errors. Hope this works for others too.

Desilo / liberate-fhe

Cuda compatibility #20