E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
Other
21 stars 16 forks source link

Frontier: Configure issue with ROCM 5.4.3 #512

Closed sarats closed 1 year ago

sarats commented 1 year ago

Evidently, the new ROCM module would try to link in libhsa-runtime64.so.1 which has an undefined reference to `std::condition_variable::wait(std::unique_lock&)@GLIBCXX_3.4.30.

What's the best way to handle this? Add -l stdc++ to FFLAGS or something else? What's a good way to just pass this for Scorpio build in CIME?

gmake: *** [/lustre/orion/cli115/scratch/sarat/repos/e3sm-frontier/cime/scripts/rocm543/Tools/Makefile:764: /lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/pio/pio2/Makefile] Error 1
ERROR: CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/cmake-3.23.2-4r4mpiba7cwdw2hlakh5i7tchi64s3qd/share/cmake-3.23/Modules/CMakeTestFortranCompiler.cmake:61 (message):
  The Fortran compiler

    "/opt/cray/pe/craype/2.7.19/bin/ftn"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/pio/pio2/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_36618/fast && gmake[1]: Entering directory '/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/pio/pio2/CMakeFiles/CMakeTmp'
    /usr/bin/gmake  -f CMakeFiles/cmTC_36618.dir/build.make CMakeFiles/cmTC_36618.dir/build
    gmake[2]: Entering directory '/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/pio/pio2/CMakeFiles/CMakeTmp'
    Building Fortran object CMakeFiles/cmTC_36618.dir/testFortranCompiler.f.o
    /opt/cray/pe/craype/2.7.19/bin/ftn   -f free -N 255 -h byteswapio -em -M1077 -O3 -hipa0 -hzero -em -ef -hnoacc -DTIMING -DCNL  -DLINUX -DFORTRANUNDERSCORE -DNO_R16 -DCPRCRAY -DNDEBUG -DHAVE_MPI -DMCT_INTERFACE -DPIO2 -DHAVE_SLASHPROC -D_PNETCDF -DATM_PRESENT -DICE_PRESENT -DLND_PRESENT -DOCN_PRESENT -DROF_PRESENT -DGLC_PRESENT -DWAV_PRESENT -DESP_PRESENT -DMED_PRESENT -DPIO2  -I. -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/include -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/finclude -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.1/crayclang/14.0/include -I/opt/cray/pe/mpich/8.1.23/ofi/crayclang/10.0/include -I/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0/include -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/include  -em -J. -c /lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/pio/pio2/CMakeFiles/CMakeTmp/testFortranCompiler.f -o CMakeFiles/cmTC_36618.dir/testFortranCompiler.f.o
    Linking Fortran executable cmTC_36618
    /autofs/nccs-svm1_sw/frontier/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/cmake-3.23.2-4r4mpiba7cwdw2hlakh5i7tchi64s3qd/bin/cmake -E cmake_link_script CMakeFiles/cmTC_36618.dir/link.txt --verbose=1
    /opt/cray/pe/craype/2.7.19/bin/ftn -Wl,--allow-multiple-definition  -f free -N 255 -h byteswapio -em -M1077 -O3 -hipa0 -hzero -em -ef -hnoacc -DTIMING -DCNL  -DLINUX -DFORTRANUNDERSCORE -DNO_R16 -DCPRCRAY -DNDEBUG -DHAVE_MPI -DMCT_INTERFACE -DPIO2 -DHAVE_SLASHPROC -D_PNETCDF -DATM_PRESENT -DICE_PRESENT -DLND_PRESENT -DOCN_PRESENT -DROF_PRESENT -DGLC_PRESENT -DWAV_PRESENT -DESP_PRESENT -DMED_PRESENT -DPIO2  -I. -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/include -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/finclude -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.1/crayclang/14.0/include -I/opt/cray/pe/mpich/8.1.23/ofi/crayclang/10.0/include -I/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0/include -I/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/include  CMakeFiles/cmTC_36618.dir/testFortranCompiler.f.o -o cmTC_36618
    /opt/cray/pe/cce/15.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: /opt/rocm-5.4.3/lib/libhsa-runtime64.so.1: undefined reference to `std::condition_variable::wait(std::unique_lock<std::mutex>&)@GLIBCXX_3.4.30'
    gmake[2]: *** [CMakeFiles/cmTC_36618.dir/build.make:99: cmTC_36618] Error 1
    gmake[2]: Leaving directory '/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/pio/pio2/CMakeFiles/CMakeTmp'
    gmake[1]: *** [Makefile:127: cmTC_36618/fast] Error 2
    gmake[1]: Leaving directory '/lustre/orion/cli115/proj-shared/sarat/e3sm_scratch/rocm543/bld/crayclanggpu/mpich/nodebug/nothreads/mct/pio/pio2/CMakeFiles/CMakeTmp'
sarats commented 1 year ago

FWIW, this issue wasn't present with rocm 5.4.0.

jayeshkrishna commented 1 year ago

Can you add reference to the version of code that you are trying to compile (branch/master/...)? Is ftn (with these modules) able to compile simple helloworld programs?

sarats commented 1 year ago

This branch: https://github.com/E3SM-Project/E3SM/tree/sarats/machines/frontier E3SM tests build and run with just one module change (rocm 5.4.0)

sarats commented 1 year ago

The linking issue is present with a simple standalone program as well. Will report to OLCF/HPE.

PaulMullowney commented 1 year ago

Hi, Did you get a resolution to this issue with OLCF/HPE?

sarats commented 1 year ago

@PaulMullowney FYI, two workarounds from Cray/HPE. I used the first method.

AMD started building with GCC 12.2.0, which brings in a GLIBCXX symbol that isn't in CCE's default GCC toolchain.

PaulMullowney commented 1 year ago

@sarats Thanks for the pointers! The 2nd solution was easy to test and worked for me. Implementing the first will require a little more work.