E3SM-Project / scream

Fork of E3SM used to develop exascale global atmosphere model written in C++
https://e3sm-project.github.io/scream/
Other
76 stars 55 forks source link

Build error with new test `SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu` on gcp12 #3036

Open ndkeen opened 1 day ago

ndkeen commented 1 day ago

This just may require some configs for the machine as first time tried here.

SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu

In the e3sm build log, I do see:

No macro file found: /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/cmake_macros/gcp12.cmake
CMake Error at cmake/build_eamxx.cmake:34 (include):
  include could not find requested file:

    /home/ndk/E3SM/components/eamxx/cmake/machine-files/gcp12.cmake
Call Stack (most recent call first):
  CMakeLists.txt:125 (build_eamxx)

but as there are so many error/warings, not sure if it's the actual issue

SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeError.log

Determining if the Fortran sgemm exists failed with the following output:
Change Dir: /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_42ddf/fast && make  -f CMakeFiles/cmTC_42ddf.dir/build.make CMakeFiles/cmTC_42ddf.dir/build
make[1]: Entering directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2'
Building Fortran object CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/openmpi-4.1.4-lg57hjqli32cbgtyryq7cw6omdxfjtzy/bin/mpif90    -c /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2/testFortranCompiler.f -o CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o
Linking Fortran executable cmTC_42ddf
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/cmake-3.25.1-7z33y6jx4xrph64rva2louj3r3s6oaae/bin/cmake -E cmake_link_script CMakeFiles/cmTC_42ddf.dir/link.txt --verbose=1
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/openmpi-4.1.4-lg57hjqli32cbgtyryq7cw6omdxfjtzy/bin/mpif90 CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o -o cmTC_42ddf 
CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o: In function `MAIN__':
testFortranCompiler.f:(.text+0xa): undefined reference to `sgemm_'
collect2: error: ld returned 1 exit status
make[1]: *** [cmTC_42ddf] Error 1
make[1]: Leaving directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2'
gmake: *** [cmTC_42ddf/fast] Error 2

Determining if the MPICH_VERSION exist failed with the following output:
Change Dir: /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_33807/fast && make  -f CMakeFiles/cmTC_33807.dir/build.make CMakeFiles/cmTC_33807.dir/build
make[1]: Entering directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj'
Building C object CMakeFiles/cmTC_33807.dir/CheckSymbolExists.c.o
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/openmpi-4.1.4-lg57hjqli32cbgtyryq7cw6omdxfjtzy/bin/mpicc   -mcmodel=medium  -o CMakeFiles/cmTC_33807.dir/CheckSymbolExists.c.o -c /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c
/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c: In function 'main':
/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c:8:19: error: 'MPICH_VERSION' undeclared (first use in this function); did you mean 'MPI_VERSION'?
    8 |   return ((int*)(&MPICH_VERSION))[argc];
      |                   ^~~~~~~~~~~~~
      |                   MPI_VERSION
/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c:8:19: note: each undeclared identifier is reported only once for each function it appears in
make[1]: *** [CMakeFiles/cmTC_33807.dir/CheckSymbolExists.c.o] Error 1
make[1]: Leaving directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj'
ndkeen commented 1 day ago

I think this is simply a case of needing to have a gcp12.cmake which can be moved from gcp.cmake. When I try that it builds, but get runtime error:

42: e3sm.exe: /home/ndk/E3SM/components/homme/src/share/cxx/GllFvRemapImpl.cpp:832: void Homme::GllFvRemapImpl::remap_tracer_dyn_to_fv_phys(int, int, const CPhys3T&, const Phys3T&): Assertion `qs_fv.extent_int(0) >= nelemd && qs_fv.extent_int(1) >= nf2 && qs_fv.extent_int(2) >= nq && qs_fv.extent_int(3) % packn == 0' failed.
42:
42: Program received signal SIGABRT: Process abort signal.

ie, mv components/eamxx/cmake/machine-files/gcp.cmake components/eamxx/cmake/machine-files/gcp12.cmake

ambrad commented 22 hours ago

This error suggests that somehow the test has inconsistent compile-time and run-time sizes, either of number of tracers or number of levels. You could put a printf right above that assert that prints out all the numbers that are being used in that assert.

ndkeen commented 20 minutes ago

Thanks. I tacked this into unrelated PR, but it only addresses the name of that cmake file. I don't see anything obvious in that file that's different from others. Was just thinking making progress here to avoid build error, then can make another issue with runtime error.