E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
19 stars 16 forks source link

Configuration issue with cray-mpich and CMake 3.22 or higher #517

Open dqwu opened 1 year ago

dqwu commented 1 year ago

[Summary] This seems to be an issue related to CMake 3.22 or higher: not reproducible with 3.21.6, reproducible with 3.22.0, reproducible with latest 3.26.4.

  1. For Cray wrappers (cc, CC, ftn) and "-DCMAKE_SYSTEM_NAME=Catamount", there is a CMake error (Could NOT find MPI). Not reproducible if CMAKE_SYSTEM_NAME is not set.
  2. For non-Cray MPI wrappers (mpicc, mpicxx, mpifort), there is a hanging issue during configuration, no matter CMAKE_SYSTEM_NAME is set to Catamount or not.

Reproducible on some E3SM machines with available Cray MPICH, including Perlmutter, Crusher/Frontier, and Sunspot (Note: The script below sets the CMake system name to Catamount).

[Steps to reproduce the CMake error] On Perlmutter/Crusher/Frontier, run the same commands below:

module purge
module load PrgEnv-gnu
module load cmake

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

CC=cc CXX=CC FC=ftn \
cmake -Wno-dev \
-DCMAKE_SYSTEM_NAME=Catamount \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 \
-DPIO_USE_MALLOC=ON \
..

Error on Perlmutter with default cmake/3.24.3

...
-- ===== Configuring SCORPIO File info tool... =====
-- Could NOT find MPI_C (missing: MPI_C_WORKS) 
-- Found MPI_CXX: /opt/cray/pe/craype/2.7.20/bin/CC (found version "3.1") 
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_WORKS) 
CMake Error at /global/common/software/nersc/pm-2022q4/spack/linux-sles15-zen/cmake-3.24.3-k5msymx/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND MPI_Fortran_FOUND) (found version
  "3.1")
Call Stack (most recent call first):
  /global/common/software/nersc/pm-2022q4/spack/linux-sles15-zen/cmake-3.24.3-k5msymx/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /global/common/software/nersc/pm-2022q4/spack/linux-sles15-zen/cmake-3.24.3-k5msymx/share/cmake-3.24/Modules/FindMPI.cmake:1835 (find_package_handle_standard_args)
  tools/spio_finfo/CMakeLists.txt:21 (find_package)

Error on Frontier with default cmake/3.23.2

...
-- ===== Configuring SCORPIO File info tool... =====
-- Could NOT find MPI_C (missing: MPI_C_WORKS) 
-- Found MPI_CXX: /opt/cray/pe/craype/2.7.19/bin/CC (found version "3.1") 
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_WORKS) 
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/cmake-3.23.2-4r4mpiba7cwdw2hlakh5i7tchi64s3qd/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND MPI_Fortran_FOUND) (found version
  "3.1")
Call Stack (most recent call first):
  /autofs/nccs-svm1_sw/frontier/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/cmake-3.23.2-4r4mpiba7cwdw2hlakh5i7tchi64s3qd/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /autofs/nccs-svm1_sw/frontier/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/cmake-3.23.2-4r4mpiba7cwdw2hlakh5i7tchi64s3qd/share/cmake-3.23/Modules/FindMPI.cmake:1830 (find_package_handle_standard_args)
  tools/spio_finfo/CMakeLists.txt:21 (find_package)

[Steps to reproduce the hanging issue during CMake configuration] On Perlmutter/Crusher/Frontier, run the same commands below:

module purge
module load PrgEnv-gnu
module load cmake

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

CC=mpicc CXX=mpicxx FC=mpifort \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 \
-DPIO_USE_MALLOC=ON \
..

CMake output:

...
-- Looking for gettimeofday - found
-- ===== Configuring SCORPIO C library... =====
// Hanging here

Note: same hanging issue with "-DCMAKE_SYSTEM_NAME=Catamount"

[About configuration of SCORPIO in E3SM builds] Cray wrappers (cc, CC, ftn) are always used for Perlmutter/Crusher/Frontier so there are no hanging issues associated with regular MPI wrappers.

Perlmutter uses "<OS>Linux</OS>" such that "-DCMAKE_SYSTEM_NAME=Catamount" is not set, and it can use newer cmake/3.24.3

Crusher and Frontier use "<OS>CNL</OS>" such that "-DCMAKE_SYSTEM_NAME=Catamount" is set in CNL.cmake, and they both have to use older cmake/3.21.3

dqwu commented 1 year ago

@jayeshkrishna FYI, E3SM developers have decided to remove CMake macro file used on old Cray supercomputers using Catamount OS, see https://github.com/E3SM-Project/E3SM/pull/5745