NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
15 stars 30 forks source link

Update Orion modulefiles to Rocky 9 #1159

Closed RussTreadon-NOAA closed 3 months ago

RussTreadon-NOAA commented 3 months ago

Received the following from RDHPCS Management

Orion’s Operating System (OS) and software stack is scheduled to be upgraded during a two day downtime, starting on Wednesday, June 12th and going through Thursday, June 13th. The OS on Orion will be upgraded from CentOS 7 to Rocky 9, another derivative of Red Hat Linux.

This issue is opened to document the updating of modulefiles/EVA/orion.lua and modulefiles/GDAS/orion.intel.lua to Rocky 9

RussTreadon-NOAA commented 3 months ago

@DavidHuber-NOAA notes that spack-stack #981 addresses the Orion Rocky 9 update from the module perspective.

RussTreadon-NOAA commented 3 months ago

As a test, clone g-w develop at 5af325a6 on Orion following Rocky 9 upgrade. This snapshot of g-w develop uses GDASApp at 368c9c5. Copy GDASApp modulefiles/GDAS/hercules.intel.lua to orion.intel.lua. Build GDASApp. Run test_gdasapp. 36 out of 48 test pass.

77% tests passed, 11 tests failed out of 48

Label Time Summary:
gdas-utils    =  11.54 sec*proc (11 tests)
script        =  11.54 sec*proc (11 tests)

Total Test time (real) = 1321.78 sec

The following tests FAILED:
        1843 - test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS (Failed)
        1844 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
        1845 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
        1846 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
        1847 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN (Failed)
        1848 - test_gdasapp_soca_copy_scratch (Failed)
        1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
        1850 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
        1851 - test_gdasapp_soca_socahybridweights (Failed)
        1852 - test_gdasapp_soca_incr_handler (Failed)
        1853 - test_gdasapp_soca_ens_handler (Failed)

All failures except test_gdasapp_soca_copy_scratch are due to

sbatch: error: invalid partition specified: hercules
sbatch: error: Batch job submission failed: Invalid partition name specified

test/soca/gw/CMakeLists.txt sets variable MACHINE via

# Identify machine
set(MACHINE "container")
IF (IS_DIRECTORY /work2)
  IF (IS_DIRECTORY /apps/other)
    set(MACHINE "hercules")
    set(PARTITION "hercules")
  ELSE()
    set(MACHINE "orion")
    set(PARTITION "orion")
  ENDIF()
ENDIF()
IF (IS_DIRECTORY /scratch2/NCEPDEV/)
  set(MACHINE "hera")
  set(PARTITION "hera")
ENDIF()

IF (IS_DIRECTORY /lfs/h2/)
   set(MACHINE "wcoss2")
ENDIF()

Directory /apps/other exists on Orion following the Rocky 9 upgrade. Thus, we wind up with MACHINE and PARTITION set to hercules. I do not know if there remain any directories unique to Orion and Hercules after the Rocky 9 upgrade which we can use to distinguish between the machines.

Test test_gdasapp_soca_copy_scratch failed due to an expected directory

/work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd/build/gdas/test/soca/gw/testrun/testjjobs/RUNDIRS/gdas_test/gdasocnanal_12/

not being present. This absence of this directory is likely due to failed soca tests prior to this test.

FYI @guillaumevernieres - we need to figure out how to distinguish between Orion and Hercules following the Rocky 9 upgrade.

RussTreadon-NOAA commented 3 months ago

build.sh sets BUILD_TARGET. Use this to set MACHINE and PARTITION via the following changes

build.sh

@@ -87,7 +87,7 @@ case ${BUILD_TARGET} in
     ;;
 esac

-CMAKE_OPTS+=" -DCLONE_JCSDADATA=$CLONE_JCSDADATA"
+CMAKE_OPTS+=" -DCLONE_JCSDADATA=$CLONE_JCSDADATA -DMACHINE=$BUILD_TARGET"

 BUILD_DIR=${BUILD_DIR:-$dir_root/build}
 if [[ $CLEAN_BUILD == 'YES' ]]; then

test/soca/gw/CMakeLists.txt

@@ -10,25 +10,14 @@ add_test(NAME test_gdasapp_soca_prep
       ENVIRONMENT "PYTHONPATH=${PROJECT_BINARY_DIR}/ush:${PROJECT_SOURCE_DIR}/../../ush/python/wxflow/src:$ENV{PYTHONPATH}")

 # Identify machine
-set(MACHINE "container")
-IF (IS_DIRECTORY /work2)
-  IF (IS_DIRECTORY /apps/other)
-    set(MACHINE "hercules")
-    set(PARTITION "hercules")
-  ELSE()
-    set(MACHINE "orion")
-    set(PARTITION "orion")
-  ENDIF()
-ENDIF()
-IF (IS_DIRECTORY /scratch2/NCEPDEV/)
-  set(MACHINE "hera")
+if (MACHINE STREQUAL "hercules")
+  set(PARTITION "hercules")
+ELSEIF (MACHINE STREQUAL "orion")
+  set(PARTITION "orion")
+ELSEIF (MACHINE STREQUAL "hera")
   set(PARTITION "hera")
 ENDIF()

-IF (IS_DIRECTORY /lfs/h2/)
-   set(MACHINE "wcoss2")
-ENDIF()
-
 # Clean-up
 add_test(NAME test_gdasapp_soca_run_clean
   COMMAND  ${CMAKE_COMMAND} -E remove_directory ${PROJECT_BINARY_DIR}/test/soca/gw/testrun/testjjobs)

Also need to add hack to g-w workflow/hosts.py. g-w issue #2695 reports a bug in hosts.py following the Orion Rocky 9 upgrade. The hack forces machine=ORION when hosts.py is executed. This hack is required for GDASApp ctests which run g-w jobs.

Build GDASApp inside g-w on Orion with the hosts.py hack and the above GDASApp local changes in place. Run ctests. 48 out of 48 tests pass.

Test project /work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd/build
      Start 1489: test_gdasapp_util_coding_norms
 1/48 Test #1489: test_gdasapp_util_coding_norms ........................   Passed    4.56 sec
      Start 1490: test_gdasapp_util_ioda_example
 2/48 Test #1490: test_gdasapp_util_ioda_example ........................   Passed   10.26 sec

...

      Start 1869: test_gdasapp_atm_jjob_ens_final
47/48 Test #1869: test_gdasapp_atm_jjob_ens_final .......................   Passed   42.23 sec
      Start 1870: test_gdasapp_aero_gen_3dvar_yaml
48/48 Test #1870: test_gdasapp_aero_gen_3dvar_yaml ......................   Passed    0.51 sec

100% tests passed, 0 tests failed out of 48

Label Time Summary:
gdas-utils    =  23.59 sec*proc (11 tests)
script        =  23.59 sec*proc (11 tests)

Total Test time (real) = 1622.32 sec

The above changes are in /work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd

        modified:   build.sh
        modified:   modulefiles/GDAS/orion.intel.lua
        modified:   test/soca/gw/CMakeLists.txt

Note that orion.intel.lua has been updated to specify

prepend_path("MODULEPATH", '/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/modulefiles/Core')
RussTreadon-NOAA commented 3 months ago

Changes to enable GDASApp to build and run on Orion following the Rocky 9 upgrade will be committed to RussTreadon-NOAA:feature/orion_rocky9.