amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
374 stars 62 forks source link

MPI problems installing on Rocky Linux v8.4 #137

Closed jpearl01 closed 2 years ago

jpearl01 commented 2 years ago

Hello, I'm running into an issue trying to compile the MPI version of raxml-ng on Rocky linux v8.4. I'm assuming that I'm just missing some package, but I have figured out what it would be. For anyone else trying to install on Rocky, I've installed these packages: flex-2.6.1-9.el8.x86_64 flex-devel-2.6.1-9.el8.x86_64 bison-devel-3.0.4-10.el8.x86_64 bison-3.0.4-10.el8.x86_64 gtest-1.8.0-5.el8.x86_64 gtest-devel-1.8.0-5.el8.x86_64 clang-tools-extra.x86_64

So far, those have allowed me to mostly execute the cmake command without error, except for this troublesome MPI one:

$ cmake -DUSE_MPI=ON ..
-- Compiler: GNU 8.4.1 => /usr/bin/c++
-- Building RELEASE
-- Using flags: -std=c++11 -Wall -Wextra -D_RAXML_PTHREADS -pthread -D_RAXML_TERRAPHAST
-- Building dependencies in: /opt/raxml-ng/build/localdeps
-- pll-modules static build enabled
-- SSE enabled. To disable it, run cmake with -DENABLE_SSE=false
-- AVX enabled. To disable it, run cmake with -DENABLE_AVX=false
-- AVX2 enabled. To disable it, run cmake with -DENABLE_AVX2=false
-- Libpll static build enabled
-- pll_static;pll_static;pll_static;pll_static;pll_static;pll_static;pll_static
-- Will compile pll-module optimize
-- Will compile pll-module algorithm
-- Will compile pll-module binary
-- Will compile pll-module msa
-- Will compile pll-module tree
-- Will compile pll-module util
-- clang-tidy found: /usr/bin/clang-tidy
-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS)
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
-- Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND)
    Reason given by package: MPI component 'Fortran' was requested, but language Fortran is not enabled.

-- Building tests
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/raxml-ng/build

I've tried installing openmpi-4.0.5-3.el8.x86_64 openmpi-devel-4.0.5-3.el8.x86_64 mpich-3.3.2-9.el8.x86_64 mpich-devel-3.3.2-9.el8.x86_64

But no dice. Trying things like:

dnf whatprovides MPI_C_HEADER_DIR
Last metadata expiration check: 2:23:56 ago on Tue 01 Feb 2022 08:48:03 AM EST.
Error: No Matches found

fails as above. Any ideas how to get around this?

Thanks

amkozlov commented 2 years ago

Hi,

since Rocky is a RHEL derivative, you probably have to load MPI module, eg.

module load mpi/openmpi-x86_64
jpearl01 commented 2 years ago

Thanks for the fast reply, and the good hint.

For others trying to go down the Rocky/Centos/RHEL v8 path - I was able to get this to compile, but I think it was a little tricky. Trying to just load the mpi module didn't seem to work, but I was able to see that there WAS an mpi module available with a couple of steps. Even though I had openmpi.x86_64 installed, for some reason the module command didn't see it (I know literally nothing about module - I'm coming from Centos 6 where we had raxml-mpi, and there weren't any modules back then that I know about). Anyway looking at module avail, there were no mpi modules I could see:

# module load mpi/openmpi-x86_64
Lmod has detected the following error: The following module(s) are unknown: "mpi/openmpi-x86_64"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore-cache load "mpi/openmpi-x86_64"

Also make sure that all modulefiles written in TCL start with the string #%Module

# module avail

-------------------------------------------- /opt/ohpc/admin/modulefiles ---------------------------------------------
   spack/0.15.0

--------------------------------------------- /opt/ohpc/pub/modulefiles ----------------------------------------------
   EasyBuild/4.3.4    cmake/3.19.4    hwloc/2.1.0         magpie/2.5    prun/2.1     valgrind/3.16.1
   autotools          gnu9/9.3.0      libfabric/1.12.1    os            ucx/1.9.0

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

However, there was an openmpi module around:

# module spider

------------------------------------------------------------------------------------------------------------------
The following is a list of the modules and extensions currently available:
------------------------------------------------------------------------------------------------------------------
  EasyBuild: EasyBuild/4.3.4
    Software build and installation framework

  autotools: autotools
    Developer utilities

  cmake: cmake/3.19.4
    CMake is an open-source, cross-platform family of tools designed to build, test and package software.

  gnu9: gnu9/9.3.0
    GNU Compiler Family (C/C++/Fortran for x86_64)

  hwloc: hwloc/2.1.0
    Portable Hardware Locality

  libfabric: libfabric/1.12.1
    Development files for the libfabric library

  magpie: magpie/2.5
    Scripts for running Big Data in HPC environments

  mpich: mpich/3.3.2-ofi
    MPICH MPI implementation

  openmpi4: openmpi4/4.0.5
    A powerful implementation of MPI/SHMEM

(truncated here)

But, it wouldn't let me load it unless I loaded gnu9/9.3.0 first

# module spider mpi

------------------------------------------------------------------------------------------------------------------
  mpich: mpich/3.3.2-ofi
------------------------------------------------------------------------------------------------------------------
    Description:
      MPICH MPI implementation

    You will need to load all module(s) on any one of the lines below before the "mpich/3.3.2-ofi" module is available to load.

      gnu9/9.3.0

    Help:

      This module loads the mpich library built with the gnu9 toolchain.

      Version 3.3.2

Once I had loaded the gnu9/9.3.0, then the openmpi module became available:

# module load gnu9/9.3.0
# module avail

------------------------------------------- /opt/ohpc/pub/moduledeps/gnu9 --------------------------------------------
   mpich/3.3.2-ofi    openmpi4/4.0.5

-------------------------------------------- /opt/ohpc/admin/modulefiles ---------------------------------------------
   spack/0.15.0

--------------------------------------------- /opt/ohpc/pub/modulefiles ----------------------------------------------
   EasyBuild/4.3.4    cmake/3.19.4        hwloc/2.1.0         magpie/2.5    prun/2.1     valgrind/3.16.1
   autotools          gnu9/9.3.0   (L)    libfabric/1.12.1    os            ucx/1.9.0

  Where:
   L:  Module is loaded

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

# module load openmpi4/4.0.5
# cmake -DUSE_MPI=ON ..
-- Compiler: GNU 8.4.1 => /usr/bin/c++
-- Building RELEASE
-- Using flags: -std=c++11 -Wall -Wextra -D_RAXML_PTHREADS -pthread -D_RAXML_TERRAPHAST
-- Building dependencies in: /opt/raxml-ng/build/localdeps
-- pll-modules static build enabled
-- SSE enabled. To disable it, run cmake with -DENABLE_SSE=false
-- AVX enabled. To disable it, run cmake with -DENABLE_AVX=false
-- AVX2 enabled. To disable it, run cmake with -DENABLE_AVX2=false
-- Libpll static build enabled
-- pll_static;pll_static;pll_static;pll_static;pll_static;pll_static;pll_static;pll_static
-- Will compile pll-module optimize
-- Will compile pll-module algorithm
-- Will compile pll-module binary
-- Will compile pll-module msa
-- Will compile pll-module tree
-- Will compile pll-module util
-- clang-tidy found: /usr/bin/clang-tidy
-- Found MPI_C: /opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.5/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.5/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Building tests
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/raxml-ng/build

At that point I was able to successfully compile. However, I'm writing all of this out because I think it was using a module from a different repository than 'normal' - I'm running on a cluster, and had installed SLURM via the ohpc repository Considering the paths used, I'm pretty sure the modules being used were from an openmpi installation from that repo. Here is the relevant packages from that repository:

openmpi4-gnu9-ohpc-4.0.5-4.1.ohpc.2.1.x86_64  
mpich-ofi-gnu9-ohpc-3.3.2-10.1.ohpc.2.0.x86_64 

Sorry for the long post, but I just figured someone else might find this helpful if they go the route I did. Thanks again for the help.