ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
413 stars 157 forks source link

could not find CUDA when compiling LAMMPS #376

Closed JPDarby closed 2 months ago

JPDarby commented 2 months ago

Hi, I’m trying to compile LAMMPS with MACE for GPU on a linux cluster (Construtor Research Platform, has V100s) but am running into some errors. I’m blindly following the instructions for csd3 https://mace-docs.readthedocs.io/en/latest/guide/lammps.html#instructions-for-gpu but when I run cmake I get

-- Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found version "12.3")
CMake Warning at /home/coder/project/libtorch-gpu/share/cmake/Caffe2/public/cuda.cmake:31 (message):
  Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
  or a Caffe2 dependent library, the next warning / error will give you more
  info.
Call Stack (most recent call first):
  /home/coder/project/libtorch-gpu/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
  /home/coder/project/libtorch-gpu/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  Modules/Packages/ML-MACE.cmake:3 (find_package)
  CMakeLists.txt:526 (include)

CMake Error at /home/coder/project/libtorch-gpu/share/cmake/Caffe2/Caffe2Config.cmake:91 (message):
  Your installed Caffe2 version uses CUDA but I cannot find the CUDA
  libraries.  Please set the proper CUDA prefixes and / or install CUDA.
Call Stack (most recent call first):
  /home/coder/project/libtorch-gpu/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  Modules/Packages/ML-MACE.cmake:3 (find_package)
  CMakeLists.txt:526 (include)

The output from nvcc --version is

production-env-fedd6b48a81145c9b537b8b9c260d7df% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52

so cuda is installed. I downloaded libtorch for cuda=12.1 so could this mismatch be the issue? If so, there doesn’t seem to be a libtorch for cuda=12.3…

JPDarby commented 2 months ago

Suggestion from @wcwitt fixed this. It looks like CMake just isn't finding CUDA, which is hopefully simpler than a LAMMPS or libtorch problem. I would try this to start: https://stackoverflow.com/a/46515110

Explicitly I added -D CUDA_TOOLKIT_ROOT_DIR=/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/ \