E3SM-Project / Omega

Next generation ocean model within E3SM
https://docs.e3sm.org/Omega/omega
Other
4 stars 5 forks source link

CTests are *very* slow on Perlmutter-CPU with Intel #119

Open xylar opened 1 month ago

xylar commented 1 month ago

I'm running CTests on Perlmutter-CPU with Intel using build and job scripts I'll provide below. I'm seeing run times that don't seem expected:

Test project /global/u2/x/xylar/e3sm_work/polaris/main/build_omega/build_pm-cpu_intel
      Start  1: DATA_TYPES_TEST
 1/22 Test  #1: DATA_TYPES_TEST ..................   Passed   22.45 sec
      Start  2: MACHINE_ENV_TEST
 2/22 Test  #2: MACHINE_ENV_TEST .................   Passed   29.32 sec
      Start  3: BROADCAST_TEST
 3/22 Test  #3: BROADCAST_TEST ...................   Passed   17.37 sec
      Start  4: LOGGING_TEST
 4/22 Test  #4: LOGGING_TEST .....................   Passed   64.15 sec
      Start  5: DECOMP_TEST
 5/22 Test  #5: DECOMP_TEST ......................   Passed    4.63 sec
      Start  6: HALO_TEST
 6/22 Test  #6: HALO_TEST ........................   Passed   44.09 sec
      Start  7: HORZMESH_TEST
 7/22 Test  #7: HORZMESH_TEST ....................   Passed    1.31 sec
      Start  8: HORZOPERATORS_PLANE_TEST
 8/22 Test  #8: HORZOPERATORS_PLANE_TEST .........   Passed    1.16 sec
      Start  9: HORZOPERATORS_SPHERE_TEST
 9/22 Test  #9: HORZOPERATORS_SPHERE_TEST ........   Passed    2.25 sec
      Start 10: AUXVARS_PLANE_TEST
10/22 Test #10: AUXVARS_PLANE_TEST ...............   Passed    9.78 sec
      Start 11: AUXVARS_SPHERE_TEST
11/22 Test #11: AUXVARS_SPHERE_TEST ..............   Passed   14.10 sec
      Start 12: AUXSTATE_TEST
12/22 Test #12: AUXSTATE_TEST ....................   Passed   15.72 sec
      Start 13: IO_TEST
13/22 Test #13: IO_TEST ..........................   Passed    9.47 sec
      Start 14: CONFIG_TEST
14/22 Test #14: CONFIG_TEST ......................   Passed   83.94 sec
      Start 15: METADATA_TEST
15/22 Test #15: METADATA_TEST ....................   Passed   54.79 sec
      Start 16: IOFIELD_TEST
16/22 Test #16: IOFIELD_TEST .....................   Passed    4.74 sec
      Start 17: TEND_PLANE_TEST
17/22 Test #17: TEND_PLANE_TEST ..................   Passed   53.28 sec
      Start 18: TEND_SPHERE_TEST
18/22 Test #18: TEND_SPHERE_TEST .................   Passed   64.03 sec
      Start 19: STATE_TEST
19/22 Test #19: STATE_TEST .......................   Passed   26.14 sec
      Start 20: TIMEMGR_TEST
20/22 Test #20: TIMEMGR_TEST .....................   Passed   12.55 sec
      Start 21: REDUCTIONS_TEST
21/22 Test #21: REDUCTIONS_TEST ..................   Passed   28.09 sec
      Start 22: KOKKOS_TEST
22/22 Test #22: KOKKOS_TEST ......................   Passed   83.84 sec

100% tests passed, 0 tests failed out of 22

Total Test time (real) = 648.60 sec

On other machines and compilers I think these tests run much faster.

grnydawn commented 1 month ago

I also experienced this issue on the Perlmutter GPU. I think I am going to debug this performance issue after I finish working on the E3SM machine files on Frontier.