Closed ndkeen closed 11 months ago
Allowing the writing of core files, I learn:
#0 0x000015223f894cdb in raise () from /lib64/libc.so.6
#1 0x000015223f896375 in abort () from /lib64/libc.so.6
#2 0x000015223f8dab07 in __libc_message () from /lib64/libc.so.6
#3 0x000015223f8e2b8a in malloc_printerr () from /lib64/libc.so.6
#4 0x000015223f8e2e5c in munmap_chunk () from /lib64/libc.so.6
#5 0x0000152241249633 in f90_dealloc03a_i8 () from /opt/AMD/aocc-compiler-3.2.0/bin/../lib/libflang.so
#6 0x0000000003797f65 in mpas_dmpar::mpas_dmpar_destroy_communication_list ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/framework/mpas_dmpar.f90:6013
#7 0x00000000037a8de8 in mpas_dmpar::mpas_dmpar_exch_group_destroy_buffers ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/framework/mpas_dmpar.f90:8198
#8 0x00000000037a1b05 in mpas_dmpar::mpas_dmpar_exch_group_full_halo_exch ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/framework/mpas_dmpar.f90:6961
#9 0x00000000037a1f13 in mpas_dmpar::mpas_dmpar_field_halo_exch ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/framework/mpas_dmpar.f90:7016
#10 0x000000000382aeb4 in mpas_stream_manager::exch_all_halos ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/framework/mpas_stream_manager.f90:4739
#11 0x0000000003827fbd in mpas_stream_manager::read_stream ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/framework/mpas_stream_manager.f90:4023
#12 0x0000000003824c74 in mpas_stream_manager::mpas_stream_mgr_read ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/framework/mpas_stream_manager.f90:3546
#13 0x000000000373ec42 in seaice_core::seaice_core_init ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/bld/cmake-bld/core_seaice/model_forward/mpas_seaice_core.f90:111
#14 0x0000000002fb764d in ice_comp_mct::ice_init_mct ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang.20230123_125148_wf4qjc/mpas-seaice/driver/ice_comp_mct.f90:621
#15 0x000000000063d79a in component_mod::component_init_cc () at /global/cfs/cdirs/e3sm/ndk/repos/me11-jan12/driver-mct/main/component_mod.F90:257
#16 0x000000000060cfeb in cime_comp_mod::cime_init () at /global/cfs/cdirs/e3sm/ndk/repos/me11-jan12/driver-mct/main/cime_comp_mod.F90:1464
#17 0x000000000063b271 in cime_driver () at /global/cfs/cdirs/e3sm/ndk/repos/me11-jan12/driver-mct/main/cime_driver.F90:122
Similar stack for SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang
#0 0x0000151ad9aaacdb in raise () from /lib64/libc.so.6
#1 0x0000151ad9aac375 in abort () from /lib64/libc.so.6
#2 0x0000151ad9af0b07 in __libc_message () from /lib64/libc.so.6
#3 0x0000151ad9af8b8a in malloc_printerr () from /lib64/libc.so.6
#4 0x0000151ad9afa94c in _int_free () from /lib64/libc.so.6
#5 0x0000151adb45f633 in f90_dealloc03a_i8 () from /opt/AMD/aocc-compiler-3.2.0/bin/../lib/libflang.so
#6 0x0000000000d0e435 in mpas_dmpar::mpas_dmpar_destroy_communication_list ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/framework/mpas_dmpar.f90:6013
#7 0x0000000000d1f2a2 in mpas_dmpar::mpas_dmpar_exch_group_destroy_buffers ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/framework/mpas_dmpar.f90:8197
#8 0x0000000000d17fd5 in mpas_dmpar::mpas_dmpar_exch_group_full_halo_exch ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/framework/mpas_dmpar.f90:6961
#9 0x0000000000d183e3 in mpas_dmpar::mpas_dmpar_field_halo_exch ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/framework/mpas_dmpar.f90:7016
#10 0x0000000000da1384 in mpas_stream_manager::exch_all_halos ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/framework/mpas_stream_manager.f90:4739
#11 0x0000000000d9e48d in mpas_stream_manager::read_stream ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/framework/mpas_stream_manager.f90:4023
#12 0x0000000000d9b144 in mpas_stream_manager::mpas_stream_mgr_read ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/framework/mpas_stream_manager.f90:3546
#13 0x0000000000cb5112 in seaice_core::seaice_core_init ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/bld/cmake-bld/core_seaice/model_forward/mpas_seaice_core.f90:111
#14 0x000000000052db1d in ice_comp_mct::ice_init_mct ()
at /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/me11-jan12/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.r00/mpas-seaice/driver/ice_comp_mct.f90:621
#15 0x00000000003c822a in component_mod::component_init_cc () at /global/cfs/cdirs/e3sm/ndk/repos/me11-jan12/driver-mct/main/component_mod.F90:257
#16 0x0000000000397a7b in cime_comp_mod::cime_init () at /global/cfs/cdirs/e3sm/ndk/repos/me11-jan12/driver-mct/main/cime_comp_mod.F90:1464
#17 0x00000000003c5d01 in cime_driver () at /global/cfs/cdirs/e3sm/ndk/repos/me11-jan12/driver-mct/main/cime_driver.F90:122
Using master of July, I get a compiler build error, which looks like issue with compiler. Will add this here now and come back later
cd /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/cpl && python3 /pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/Tools/e3s\
m_compile_wrap.py /opt/cray/pe/craype/2.7.19/bin/ftn -DCPRAMD -DFORTRANUNDERSCORE -DHAVE_MPI -DLinux -DMCT_INTERFACE -DNO_R16 -DYAKL_DEBUG -D_PNETCDF -I/global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/components/cmake/cpl/. -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu\
/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/amdclang/mpich/debug/nothreads/mct/include -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/amdclang/mpich/debug/nothreads/m\
ct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.3/aocc/3.0/include -I/opt/cray/pe/parallel-netcdf/1.12.3.3/aocc/3.0/include -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang\
.gh4963/bld/cmake-bld/mpas-framework/src -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/cpl -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.p\
m-cpu_amdclang.gh4963/bld/cmake-bld/cmake/atm -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/lnd -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTE\
STM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/ice -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/ocn -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v\
3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/rof -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/glc -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60\
to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/wav -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/iac -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_\
oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/cmake-bld/cmake/esp -I/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/SourceMods/src.drv -I/global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/driver-mct/main -I/p\
scratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-jul18/SMS_D_Ld1.T62_oEC60to30v3.DTESTM.pm-cpu_amdclang.gh4963/bld/lnd/obj -O0 -g -Mflushz -Mfreeform -DUSE_CONTIGUOUS= -c /global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/driver-mct/main/prep_glc_mod.F90 -o CMakeFil\
es/e3sm.exe.dir/global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/driver-mct/main/prep_glc_mod.F90.o
/global/common/software/nersc/pm-2022q4/spack/linux-sles15-zen/cmake-3.24.3-k5msymx/bin/cmake -E touch cmake/cpl/CMakeFiles/e3sm.exe.dir/global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/driver-mct/main/cplcomp_exchange_mod.F90.o.provides.build
/global/common/software/nersc/pm-2022q4/spack/linux-sles15-zen/cmake-3.24.3-k5msymx/bin/cmake -E touch cmake/cpl/CMakeFiles/e3sm.exe.dir/global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/driver-mct/main/prep_iac_mod.F90.o.provides.build
/global/common/software/nersc/pm-2022q4/spack/linux-sles15-zen/cmake-3.24.3-k5msymx/bin/cmake -E touch cmake/cpl/CMakeFiles/e3sm.exe.dir/global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/driver-mct/main/prep_rof_mod.F90.o.provides.build
clang-13: error: unable to execute command: Segmentation fault
clang-13: error: Fortran frontend to LLVM command failed due to signal (use -v to see invocation)
AMD clang version 13.0.0 (CLANG: AOCC_3.2.0-Build#128 2021_11_12) (based on LLVM Mirror.Version.13.0.0)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/AMD/aocc-compiler-3.2.0/include/../bin
clang-13: note: diagnostic msg: Error generating preprocessed source(s).
Target CMakeFiles/e3sm.exe.dir/global/cfs/cdirs/e3sm/ndk/repos/nexty-jul18/driver-mct/main/prep_glc_mod.F90.o built in 1.925062 seconds
With master of Nov 29th, this is no longer failing. May have been another issue fixed with newer AMD compiler version done in https://github.com/E3SM-Project/E3SM/pull/6003
Using AMD compiler on
pm-cpu
,SMS_D.ne4pg2_oQU480.F2010.pm-cpu_amdclang
I see the following error with DEBUG attempt:Note to compile with AMD, we still need this work-around: https://github.com/E3SM-Project/E3SM/issues/4949
When I tried this again with Jan 2023 master, I now see a file
run/log.seaice.0046.err
that I don't think was there before.