Luthaf / rascaline

Computing representations for atomistic machine learning
https://luthaf.fr/rascaline/
BSD 3-Clause "New" or "Revised" License
44 stars 13 forks source link

rascaline.torch install requests CUDA_TOOLKIT_ROOT_DIR #305

Open bananenpampe opened 5 months ago

bananenpampe commented 5 months ago

Hello,

I am trying to make this installscript run. On my M1 mac it runs without problems, on cosmosrv it yields this error:

CUDA_TOOLKIT_ROOT_DIR not found or specified

what can be done to fix it?

#!/bin/bash

conda install -c conda-forge rust python=3.10

#purge conda cache
conda clean --all

#purge the pip cache
pip cache purge

# pip installs
pip install cmake numpy
pip install --extra-index-url https://download.pytorch.org/whl/cpu torch==2.3.0

pip install -r requirements.txt

pip install metatensor
pip install metatensor-core
pip install metatensor-operations
pip install metatensor-torch

pip install git+https://github.com/Luthaf/rascaline
pip install git+https://github.com/luthaf/rascaline#subdirectory=python/rascaline-torch
bananenpampe commented 5 months ago

this is the full error message:

  Building wheel for rascaline-torch (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for rascaline-torch (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [102 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-310
      creating build/lib.linux-x86_64-cpython-310/rascaline
      creating build/lib.linux-x86_64-cpython-310/rascaline/torch
      copying rascaline/torch/calculators.py -> build/lib.linux-x86_64-cpython-310/rascaline/torch
      copying rascaline/torch/__init__.py -> build/lib.linux-x86_64-cpython-310/rascaline/torch
      copying rascaline/torch/system.py -> build/lib.linux-x86_64-cpython-310/rascaline/torch
      copying rascaline/torch/calculator_base.py -> build/lib.linux-x86_64-cpython-310/rascaline/torch
      copying rascaline/torch/_c_lib.py -> build/lib.linux-x86_64-cpython-310/rascaline/torch
      copying rascaline/torch/utils.py -> build/lib.linux-x86_64-cpython-310/rascaline/torch
      running egg_info
      writing rascaline_torch.egg-info/PKG-INFO
      writing dependency_links to rascaline_torch.egg-info/dependency_links.txt
      writing requirements to rascaline_torch.egg-info/requires.txt
      writing top-level names to rascaline_torch.egg-info/top_level.txt
      reading manifest file 'rascaline_torch.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      warning: no previously-included files matching '.DS_Store' found anywhere in distribution
      warning: no files found matching 'rascaline-torch.tar.gz'
      adding license file 'LICENSE'
      adding license file 'AUTHORS'
      writing manifest file 'rascaline_torch.egg-info/SOURCES.txt'
      running build_ext
      Not searching for unused variables given on the command line.
      -- Running CMake version 3.29.3
      -- The CXX compiler identification is GNU 7.5.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      CUDA_TOOLKIT_ROOT_DIR not found or specified
      -- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
      CMake Warning at /tmp/pip-build-env-5lqlxsvc/normal/lib/python3.10/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:31 (message):
        Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
        or a Caffe2 dependent library, the next warning / error will give you more
        info.
      Call Stack (most recent call first):
        /tmp/pip-build-env-5lqlxsvc/normal/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
        /tmp/pip-build-env-5lqlxsvc/normal/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
        CMakeLists.txt:56 (find_package)

      CMake Error at /tmp/pip-build-env-5lqlxsvc/normal/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:91 (message):
        Your installed Caffe2 version uses CUDA but I cannot find the CUDA
        libraries.  Please set the proper CUDA prefixes and / or install CUDA.
      Call Stack (most recent call first):
        /tmp/pip-build-env-5lqlxsvc/normal/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
        CMakeLists.txt:56 (find_package)

      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
bananenpampe commented 5 months ago

and these are the pip list:

Package               Version      Editable project location
--------------------- ------------ --------------------------
aiohttp               3.9.5
aiosignal             1.3.1
appdirs               1.4.4
ase                   3.22.1
async-timeout         4.0.3
attrs                 23.2.0
certifi               2024.2.2
charset-normalizer    3.3.2
click                 8.1.7
cmake                 3.29.3
contourpy             1.2.1
cycler                0.12.1
docker-pycreds        0.4.0
filelock              3.14.0
fonttools             4.52.4
frozenlist            1.4.1
fsspec                2024.5.0
gitdb                 4.0.11
GitPython             3.1.43
idna                  3.7
ipi                   3.0.0a2      /home/kellner/install/i-pi
Jinja2                3.1.4
kiwisolver            1.4.5
lightning-utilities   0.11.2
MarkupSafe            2.1.5
matplotlib            3.9.0
metatensor            0.2.0
metatensor-core       0.1.8
metatensor-learn      0.2.2
metatensor-operations 0.2.1
metatensor-torch      0.5.1
mpmath                1.3.0
multidict             6.0.5
networkx              3.3
numpy                 1.26.4
packaging             24.0
pathtools             0.1.2
pillow                10.3.0
pip                   24.0
plumed                2.9.0
protobuf              4.25.3
psutil                5.9.8
pyparsing             3.1.2
python-dateutil       2.9.0.post0
pytorch-lightning     2.0.8
PyYAML                6.0.1
rascaline             0.1.0.dev546
requests              2.32.2
scipy                 1.13.1
sentry-sdk            2.3.1
setproctitle          1.3.3
setuptools            70.0.0
six                   1.16.0
smmap                 5.0.1
sympy                 1.12
torch                 2.3.0+cpu
torchmetrics          1.4.0.post0
tqdm                  4.66.4
typing_extensions     4.12.0
urllib3               2.2.1
wandb                 0.15.10
wheel                 0.43.0
wigners               0.3.0
yarl                  1.9.4
Luthaf commented 5 months ago

This is the usual issue with Torch's CMake files on Linux: https://github.com/pytorch/pytorch/issues/78530. They try to find a CUDA compiler even if there is no CUDA code to compile.

The workaround is to build against the CPU version of torch, by running pip install --extra-index-url https://download.pytorch.org/whl/cpu rascaline-torch ... (or equivalently, exporting PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cpu)

Luthaf commented 5 months ago

This is happening even if you already installed the CPU only version of torch; because pip does build isolation by default and re-install all dependencies in a fresh virtual environment to build the code. This means you either need to pass the flag to all installation (which is what the environment variable does) or disable build isolation in pip.