NVIDIA / TorchFort

An Online Deep Learning Interface for HPC programs on NVIDIA GPUs
https://nvidia.github.io/TorchFort/
Other
154 stars 19 forks source link

cmake cannot locate MPI fortran (from NVHPC 23.7) #3

Closed TomMelt closed 1 year ago

TomMelt commented 1 year ago

I am trying to install TorchFort dependencies with spack and then build with cmake.

So far I have installed the following dependencies (with spack and using gcc version 12.3.0):

* cmake@3.26.3     ( with options ~doc+ncurses+ownlibs build_system=generic build_type=Release)
* cuda@11.8.0      ( with options ~allow-unsupported-compilers~dev build_system=generic)
* hdf5@1.8.21      ( with options +cxx+fortran+hl~ipo+mpi+shared~szip~threadsafe+tools api=default build_system=cmake build_type=Release generator=make patches=0e20187)
* nvhpc@23.7       ( with options +blas+lapack+mpi build_system=generic install_type=single)
* yaml-cpp@0.7.0   ( with options ~ipo+pic+shared~tests build_system=cmake build_type=Release generator=make)

I have also setup and configured a conda environment which contains python 3.11.4 and I pip installed pybind11 2.11.1 and the requirements.txt file using pip install -r requirements.txt from within the conda environment.

I used the following bash script to compile my code:

#!/usr/bin/env bash

module load cmake/3.26.3/uf63q cuda/11.8.0/dmxqu hdf5/1.8.21/3bxvx nvhpc/23.7/tdmi4 yaml-cpp/0.7.0/l7fcu

source $HOME/miniconda3/bin/activate torchfort

NVHPC_ROOT="/software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/"
NVHPC_CMAKE_DIR="$NVHPC_ROOT/cmake"

rm -rf build
mkdir build && cd build

export CMAKE_PREFIX_PATH="$CMAKE_PREFIX_PATH:$HOME/miniconda3/envs/torchfort/lib/python3.11/site-packages/pybind11"

cmake -DCMAKE_INSTALL_PREFIX="$HOME/.torchfort" \
    -DNVHPC_CUDA_VERSION=11.8 \
    -DCMAKE_PREFIX_PATH="`python -c 'import torch;print(torch.utils.cmake_prefix_path)'`;${NVHPC_CMAKE_DIR}" \
    ..

I get the following error:

./comp.sh 
-- The CXX compiler identification is NVHPC 23.7.0
-- The Fortran compiler identification is NVHPC 23.7.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvfortran - skipped
-- Found CUDA: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd (found version "11.8") 
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Caffe2: CUDA detected: 11.8
-- Caffe2: CUDA nvcc is: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd/bin/nvcc
-- Caffe2: CUDA toolkit directory: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd
-- Caffe2: Header version is: 11.8
-- /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd/lib64/libnvrtc.so shorthash is 672ee683
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80
CMake Warning at ~/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  ~/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:23 (find_package)

-- Found Torch: ~/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/lib/libtorch.so  
-- Found MPI_CXX: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/openmpi-4.1.5-eq5qt6oay5atbk4jff6f5fg6tfmugwsp/lib/libmpi.so (found version "3.1") 
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_WORKS) 
CMake Error at /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cmake-3.26.3-uf63q4ykrr4cv5ppwkygp6hgjacdbt5i/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.1")
Call Stack (most recent call first):
  /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cmake-3.26.3-uf63q4ykrr4cv5ppwkygp6hgjacdbt5i/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cmake-3.26.3-uf63q4ykrr4cv5ppwkygp6hgjacdbt5i/share/cmake-3.26/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  CMakeLists.txt:24 (find_package)

-- Configuring incomplete, errors occurred!

For some reason cmake can find the MPI_CXX but not MPI_Fortran. Do you have any ideas how to get this working?

romerojosh commented 1 year ago

Hmm, it seems that CMake is picking up an alternative MPI installation in your environment, other than the NVHPC one that may not have Fortran support: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/openmpi-4.1.5-eq5qt6oay5atbk4jff6f5fg6tfmugwsp/lib/libmpi.so (found version "3.1")

I wonder if it might be due to an extraneous call to find_package(MPI REQUIRED) in our CMakeLists.txt here: https://github.com/NVIDIA/TorchFort/blob/e06613d6feccc3d11c166f146abce7abdd85f1b3/CMakeLists.txt#L24

Can you try commenting that out from the CMakeLists.txtfile and see if that resolves this issue?

TomMelt commented 1 year ago

I have managed to get past the previous error by

  1. commenting out L24 in CMakeLists.txt as suggested, and
  2. rebuilding hdf5 with nvhpc (previously it was built with gfortran which linked to a separate build of openmpi)

But now it fails at the following point

-- The CXX compiler identification is NVHPC 23.7.0
-- The Fortran compiler identification is NVHPC 23.7.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvfortran - skipped
-- Found CUDA: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd (found version "11.8") 
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Caffe2: CUDA detected: 11.8
-- Caffe2: CUDA nvcc is: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd/bin/nvcc
-- Caffe2: CUDA toolkit directory: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd
-- Caffe2: Header version is: 11.8
-- /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/cuda-11.8.0-dmxquapj2bbxtifgzf3fwl423bjh3qjd/lib64/libnvrtc.so shorthash is 672ee683
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80
CMake Warning at /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:23 (find_package)

-- Found Torch: /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/lib/libtorch.so  
-- CUDA version selected: 11.8
-- Found MPI_CXX: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_cxx.so (found version "3.1") 
-- Found MPI_Fortran: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Found HDF5: hdf5::hdf5_fortran-shared (found version "1.8.21") found components: Fortran 
-- Found Python: /home/user/miniconda3/envs/torchfort/bin/python3.11 (found suitable version "3.11.4", minimum required is "3.6") found components: Interpreter Development Development.Module Development.Embed 
-- Found pybind11: /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/pybind11/include (found version "2.11.1")
CMake Warning (dev) at /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/pybind11/share/cmake/pybind11/pybind11NewTools.cmake:220 (if):
  Policy CMP0057 is not set: Support new IN_LIST if() operator.  Run "cmake
  --help-policy CMP0057" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  IN_LIST will be interpreted as an operator when the policy is set to NEW.
  Since the policy is not set the OLD behavior will be used.
Call Stack (most recent call first):
  examples/cpp/cart_pole/CMakeLists.txt:19 (pybind11_add_module)
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Error at /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/pybind11/share/cmake/pybind11/pybind11NewTools.cmake:220 (if):
  if given arguments:

    "NOT" "ARG_WITHOUT_SOABI" "AND" "NOT" "WITH_SOABI" "IN_LIST" "ARG_UNPARSED_ARGUMENTS"

  Unknown arguments specified
Call Stack (most recent call first):
  examples/cpp/cart_pole/CMakeLists.txt:19 (pybind11_add_module)

-- Configuring incomplete, errors occurred!

Do you have any ideas why it is failing?

romerojosh commented 1 year ago

Based on the error messages, it seems like this error has to do with this CMP0057 CMake policy being set to OLD. If you add the line:

cmake_policy(SET CMP0057 NEW)

to the top of CMakeLists.txt and see if that resolves this issue?

TomMelt commented 1 year ago

I have managed to build the project by

  1. commenting out L24 in CMakeLists.txt as suggested,
  2. rebuilding hdf5 with nvhpc (previously it was built with gfortran which linked to a separate build of openmpi), and
  3. adding cmake_policy(SET CMP0057 NEW) to the CMakeLists.txt as suggested

I get the following output from CMake:

-- The CXX compiler identification is NVHPC 23.7.0
-- The Fortran compiler identification is NVHPC 23.7.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvfortran - skipped
-- Found CUDA: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7//cuda/11.8/ (found version "11.8") 
-- The CUDA compiler identification is NVIDIA 12.2.91
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/compilers/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Caffe2: CUDA detected: 11.8
-- Caffe2: CUDA nvcc is: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/cuda/11.8/bin/nvcc
-- Caffe2: CUDA toolkit directory: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7//cuda/11.8/
-- Caffe2: Header version is: 11.8
-- /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/cuda/11.8/lib64/libnvrtc.so shorthash is 672ee683
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80
CMake Warning at /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:25 (find_package)

-- Found Torch: /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/lib/libtorch.so  
-- CUDA version selected: 11.8
-- Found MPI_CXX: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_cxx.so (found version "3.1") 
-- Found MPI_Fortran: /software/spack/opt/spack/linux-ubuntu22.04-skylake/gcc-12.3.0/nvhpc-23.7-tdmi4llgnphtlarpvqggtvjukvvnr42w/Linux_x86_64/23.7/comm_libs/openmpi/openmpi-3.1.5/lib/libmpi_usempif08.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Found HDF5: hdf5::hdf5_fortran-shared (found version "1.8.21") found components: Fortran 
-- Found Python: /home/user/miniconda3/envs/torchfort/bin/python3.11 (found suitable version "3.11.4", minimum required is "3.6") found components: Interpreter Development Development.Module Development.Embed 
-- Found pybind11: /home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/pybind11/include (found version "2.11.1")
-- Configuring done (4.5s)
-- Generating done (0.0s)
-- Build files have been written to: /home/user/sync/projects/side/TorchFort/build

However, now I get the following error when trying to compile with make

[  2%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/distributed.cpp.o
"/home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/include/torch/csrc/profiler/util.h", line 133: error: identifier "__rdtsc" is undefined
    return static_cast<uint64_t>(__rdtsc());
                                 ^

"/home/user/sync/projects/side/TorchFort/src/csrc/distributed.cpp", line 56: warning: statement is unreachable [code_is_unreachable]
    CHECK_MPI(MPI_Comm_rank(mpi_comm, &rank));
    ^

Remark: individual warnings can be suppressed with "--diag_suppress <warning-name>"

"/home/user/sync/projects/side/TorchFort/src/csrc/distributed.cpp", line 57: warning: statement is unreachable [code_is_unreachable]
    CHECK_MPI(MPI_Comm_size(mpi_comm, &size));
    ^

"/home/user/sync/projects/side/TorchFort/src/csrc/distributed.cpp", line 62: warning: statement is unreachable [code_is_unreachable]
    CHECK_MPI(MPI_Bcast(&id, sizeof(id), MPI_BYTE, 0, mpi_comm));
    ^

"/home/user/sync/projects/side/TorchFort/src/csrc/distributed.cpp", line 71: warning: statement is unreachable [code_is_unreachable]
    CHECK_MPI(MPI_Comm_rank(mpi_comm, &rank));
    ^

"/home/user/sync/projects/side/TorchFort/src/csrc/distributed.cpp", line 72: warning: statement is unreachable [code_is_unreachable]
    CHECK_MPI(MPI_Comm_size(mpi_comm, &size));
    ^

"/home/user/sync/projects/side/TorchFort/src/csrc/distributed.cpp", line 126: warning: statement is unreachable [code_is_unreachable]
    CHECK_MPI(MPI_Allreduce(MPI_IN_PLACE, &val, 1, MPI_DOUBLE, MPI_SUM, mpi_comm));
    ^

"/home/user/sync/projects/side/TorchFort/src/csrc/distributed.cpp", line 132: warning: statement is unreachable [code_is_unreachable]
    CHECK_MPI(MPI_Allreduce(MPI_IN_PLACE, &val, 1, MPI_FLOAT, MPI_SUM, mpi_comm));
    ^

"/home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/include/c10/util/TypeIndex.h", line 190: error: expression must have a constant value
        string_view name = detail::fully_qualified_type_name_impl<T>();
                           ^
"/home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/include/c10/util/TypeIndex.h", line 95: note: expression cannot be interpreted
        ? (throw std::logic_error("Invalid pattern"), string_view())
           ^
"/home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/include/c10/util/TypeIndex.h", line 122: note: called from:
    return extract(
                  ^
          detected during:
            instantiation of "c10::string_view c10::util::get_fully_qualified_type_name<T>() noexcept [with T=std::string]" at line 561 of "/home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/include/c10/util/typeid.h"
            instantiation of "uint16_t caffe2::TypeMeta::addTypeMetaData<T>() [with T=std::string]" at line 686 of "/home/user/miniconda3/envs/torchfort/lib/python3.11/site-packages/torch/include/c10/util/typeid.h"

There is more of the same output. I have just listed first 50 lines. Do you have any suggestions?

romerojosh commented 1 year ago

I was looking over our builds and we use the GNU compiler for the C++ files. It appears that nvc++ does not support the __rdtsc() intrinsic which is where this error is coming from.

Can you try adding the flag -DCMAKE_CXX_COMPILER=g++ to your CMake build line to use the GNU compiler for the C++ files?

If it works, you should see CMake report a line similar to:

-- The CXX compiler identification is GNU 9.4.0
TomMelt commented 1 year ago

Thanks I managed to get a bit further by using -DCMAKE_CXX_COMPILER=g++ as suggested.

I then hit issue #6 but I have resolved that and submitted a PR #5 .

However, when I run make I still get errors. I am worried it may have something to do with mixing compiler versions (nvfortran and g++). Have you had this issue?

$ make   
[  2%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/distributed.cpp.o
[  5%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/logging.cpp.o
[  8%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/model_state.cpp.o
[ 10%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/model_wrapper.cpp.o
[ 13%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/model_pack.cpp.o
[ 16%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/param_map.cpp.o
[ 18%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/setup.cpp.o
[ 21%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/torchfort.cpp.o
[ 24%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/utils.cpp.o
[ 27%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/losses/l1_loss.cpp.o
[ 29%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/losses/mse_loss.cpp.o
[ 32%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/lr_schedulers/cosine_annealing_lr.cpp.o
[ 35%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/lr_schedulers/multistep_lr.cpp.o
[ 37%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/lr_schedulers/polynomial_lr.cpp.o
[ 40%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/lr_schedulers/scheduler_setup.cpp.o
[ 43%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/lr_schedulers/step_lr.cpp.o
[ 45%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/models/mlp_model.cpp.o
[ 48%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/rl/rl.cpp.o
[ 51%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/rl/utils.cpp.o
[ 54%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/rl/ddpg.cpp.o
[ 56%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/rl/td3.cpp.o
[ 59%] Building CXX object CMakeFiles/torchfort.dir/src/csrc/rl/sac.cpp.o
[ 62%] Linking CXX shared library lib/libtorchfort.so
[ 62%] Built target torchfort
[ 64%] Building Fortran object CMakeFiles/torchfort_fort.dir/src/fsrc/torchfort_m.F90.o
[ 67%] Linking Fortran shared library lib/libtorchfort_fort.so
[ 67%] Built target torchfort_fort
[ 70%] Building Fortran object examples/fortran/simulation/CMakeFiles/train.dir/simulation.f90.o
NVFORTRAN-F-0004-Unable to open MODULE file hdf5.mod (/home/user/sync/projects/side/TorchFort/examples/fortran/simulation/simulation.f90: 119)
NVFORTRAN/x86-64 Linux 23.7-0: compilation aborted
make[2]: *** [examples/fortran/simulation/CMakeFiles/train.dir/build.make:88: examples/fortran/simulation/CMakeFiles/train.dir/simulation.f90.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:179: examples/fortran/simulation/CMakeFiles/train.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
romerojosh commented 1 year ago

This looks like the compilation is failing to find hdf5.mod which should be in your installed HDF5 include directory. Can you check that your HDF5 include directory has this module installed? Running make VERBOSE=1 should show the compilation line to see if the right include directory with that module is being added to the compile line.

TomMelt commented 1 year ago

I forgot that spack puts the hdf5.mod files in a different location (pathtolib/static/ and pathtolib/shared/ instead of just pathtolib/). I have moved the shared libs to the main folder and now it builds.

I will now close this issue. Thanks for your help.