E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
Other
21 stars 16 forks source link

Cray Clang linker error on Frontier when building SCORPIO with openmp flag and ADIOS2 lib #528

Open dqwu opened 1 year ago

dqwu commented 1 year ago

This issue was initially reproduced on Frontier when building a scream ne1024 F case with ADIOS support.

machine: frontier-scream-gpu
compiler: crayclang-scream
LND_NTHRDS: set to a value larger than 1
modules loaded: craype-accel-amd-gfx90a rocm/5.4.0 and others
CMAKE_CXX_FLAGS passed to SCORPIO: -fopenmp and others
WITH_ADIOS2 passed to SCORPIO: ON

The linker command fails on spio_finfo.exe One possible workaround: turn off CMake option PIO_ENABLE_TOOLS

This issue can also be reproduced on Frontier with a standalone SCORPIO build.

module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio
git checkout a8d5e37

mkdir build
cd build

ADIOS2_DIR=/ccs/proj/cli115/software/adios/adios2-2.8.3-pr3345/crayclang/15.0.0 \
CC=cc CXX=CC FC=ftn \
cmake -Wno-dev \
-DWITH_ADIOS2=ON \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
-DCMAKE_CXX_FLAGS="-fopenmp" \
-DPIO_USE_MALLOC=ON \
..

make

Linker error

/opt/cray/pe/cce/15.0.0/cce-clang/x86_64/bin/llvm-link: error: linked module is broken!
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [tools/spio_finfo/CMakeFiles/spio_finfo.exe.dir/build.make:209: tools/spio_finfo/spio_finfo.exe] Error 1

Note, this issue is not reproducible if we configure and build ADIOS2 lib with the following settings: [modules] load craype-accel-amd-gfx90a and rocm/5.4.0 [CXXFLAGS]: add -fopenmp flag

jayeshkrishna commented 1 year ago

As a workaround @dqwu has rebuilt ADIOS using the following modules/flags,

[modules] load craype-accel-amd-gfx90a and rocm/5.4.0 [CXXFLAGS]: add -fopenmp flag

dqwu commented 8 months ago

@jayeshkrishna Recently, SCREAM developers made some changes for machine frontier-scream-gpu and compiler crayclang-scream:

Accordingly, ADIOS2 libs on Frontier need to be rebuilt with the same settings.

Note, we also need to override mpicc with MPICH_CC to rebuild ADIOS2. Otherwise, there are confirmed linking errors when building SCORPIO for SCREAM:

ld.lld: error: undefined symbol: __cray_sset_detect
>>> referenced by cm_util.c:65 (ADIOS2-2.9.1/thirdparty/EVPath/EVPath/cm_util.c:65)
>>>               cm_util.c.o:(CMtrace_init) in archive adios2/2.9.1/cray-mpich-8.1.26/crayclang-scream-14.0.0/lib64/libadios2_evpath.a
>>> referenced by evp.c:1057 (ADIOS2-2.9.1/thirdparty/EVPath/EVPath/evp.c:1057)

Updated workaround to rebuild ADIOS2 lib on Frontier for SCREAM: [modules] load craype-accel-amd-gfx90a and rocm/5.1.0 [wrappers] mpicc/mpicxx/ftn [C compiler]: set MPICH_CC=/opt/rocm-5.1.0/bin/hipcc [C++ compiler]: set MPICH_CXX=/opt/rocm-5.1.0/bin/hipcc