Open dqwu opened 1 year ago
@jayeshkrishna FYI, E3SM developers have decided to remove CMake macro file used on old Cray supercomputers using Catamount OS, see https://github.com/E3SM-Project/E3SM/pull/5745
This issue can also be reproduced on Frontier without using SCORPIO:
module load PrgEnv-gnu
module load cmake/3.27.9
mkdir src1
mkdir src2
cat <<EOF >> CMakeLists.txt
project (MY_PROJECT C)
message(STATUS "Configuring src1")
add_subdirectory(src1)
message(STATUS "Configuring src2")
add_subdirectory(src2)
EOF
cd src1
mkdir src1_subdir1
mkdir src1_subdir2
cat <<EOF >> CMakeLists.txt
add_subdirectory(src1_subdir1)
add_subdirectory(src1_subdir2)
EOF
cd src1_subdir1
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir1")
find_package(MPI REQUIRED COMPONENTS C)
EOF
cd ../src1_subdir2
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir2")
find_package(MPI REQUIRED COMPONENTS C)
EOF
cd ../../src2
mkdir src2_subdir
cat <<EOF >> CMakeLists.txt
add_subdirectory(src2_subdir)
EOF
cd src2_subdir
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src2_subdir")
find_package(MPI REQUIRED COMPONENTS C)
EOF
cd ../..
mkdir build
cd build
CC=cc \
cmake -Wno-dev \
-DCMAKE_SYSTEM_NAME=Catamount \
..
CMake errors:
-- The C compiler identification is GNU 12.3.0
-- Cray Programming Environment 2.7.31.11 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.31.11/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Configuring src1
-- Configuring src1_subdir1
-- Found MPI_C: /opt/cray/pe/craype/2.7.31.11/bin/cc (found version "3.1")
-- Found MPI: TRUE (found version "3.1") found components: C
-- Configuring src1_subdir1
-- Configuring src1_subdir1
-- Configuring src1_subdir2
-- Configuring src1_subdir2
-- Configuring src1_subdir2
-- Configuring src1_subdir1
-- Could NOT find MPI_C (missing: MPI_C_WORKS)
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find MPI (missing: MPI_C_FOUND C)
Call Stack (most recent call first):
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
src1/src1_subdir1/CMakeLists.txt:2 (find_package)
Even if CMAKE_SYSTEM_NAME is not set to Catamount, if craype-accel-amd-gfx90a and rocm/5.4.0 are loaded and -fopenmp flag is set to LDFLAGS, CMake 3.22 or higher also fails on Frontier with PrgEnv-cray:
module load PrgEnv-cray
module load craype-accel-amd-gfx90a rocm/5.4.0
module load cmake/3.27.9
mkdir src1
mkdir src2
cat <<EOF >> CMakeLists.txt
project (MY_PROJECT C)
message(STATUS "Configuring src1")
add_subdirectory(src1)
message(STATUS "Configuring src2")
add_subdirectory(src2)
EOF
cd src1
mkdir src1_subdir1
mkdir src1_subdir2
cat <<EOF >> CMakeLists.txt
add_subdirectory(src1_subdir1)
add_subdirectory(src1_subdir2)
EOF
cd src1_subdir1
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir1")
find_package(MPI REQUIRED COMPONENTS C)
EOF
cd ../src1_subdir2
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir2")
find_package(MPI REQUIRED COMPONENTS C)
EOF
cd ../../src2
mkdir src2_subdir
cat <<EOF >> CMakeLists.txt
add_subdirectory(src2_subdir)
EOF
cd src2_subdir
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src2_subdir")
find_package(MPI REQUIRED COMPONENTS C)
EOF
cd ../..
mkdir build
cd build
CC=cc \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
..
CMake errors:
-- The C compiler identification is Clang 17.0.3
-- Cray Programming Environment 2.7.31.11 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.31.11/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Configuring src1
-- Configuring src1_subdir1
-- Found MPI_C: /opt/cray/pe/craype/2.7.31.11/bin/cc (found version "3.1")
-- Found MPI: TRUE (found version "3.1") found components: C
-- Configuring src1_subdir1
-- Configuring src1_subdir2
-- Configuring src1_subdir2
-- Configuring src1_subdir1
-- Could NOT find MPI_C (missing: MPI_C_WORKS)
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find MPI (missing: MPI_C_FOUND C)
Call Stack (most recent call first):
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
src1/src1_subdir1/CMakeLists.txt:2 (find_package)
@jayeshkrishna A related issue has been created for CMake developers: https://discourse.cmake.org/t/regression-cmake-3-22-fails-to-find-mpi-on-cray-systems-reproducible-on-frontier-supercomputer/13045
[Summary] This seems to be an issue related to CMake 3.22 or higher on Cray systems: not reproducible with 3.21.6, reproducible with 3.22.0, reproducible with latest 3.31.1.
Reproducible on some E3SM machines with available Cray MPICH, including Perlmutter, Crusher/Frontier, and Sunspot/Aurora.
[Steps to reproduce the CMake error of case 1] On Frontier, run the commands below:
CMake output:
[Steps to reproduce the hanging issue of case 2] On Frontier, run the commands below:
CMake output:
[Steps to reproduce the CMake error of case 3] On Frontier, run the commands below:
CMake output:
[Comments] E3SM previously set CMAKE_SYSTEM_NAME to Catamount on Crusher/Frontier, but this is no longer the case.
E3SM uses non-Cray MPI wrappers (e.g., mpicxx) on Frontier to enable the use of hipcc.
For some E3SM cases running on Frontier GPU nodes, the modules craype-accel-amd-gfx90a and rocm are loaded, often with the fopenmp flag added to the build configuration.
PR #439 moved MPI detection from the root level to subprojects, which is now affected by this regression in CMake (version 3.22.0 or higher). A confirmed workaround is to add a redundant find_package(MPI) call at the root level.
A related issue has been reported to the CMake developers: https://discourse.cmake.org/t/regression-cmake-3-22-fails-to-find-mpi-on-cray-systems-reproducible-on-frontier-supercomputer/13045
Potential unconfirmed changes in CMake that might have caused this regression: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6264