Closed RussTreadon-NOAA closed 3 months ago
@DavidHuber-NOAA notes that spack-stack #981 addresses the Orion Rocky 9 update from the module perspective.
As a test, clone g-w develop
at 5af325a6 on Orion following Rocky 9 upgrade. This snapshot of g-w develop
uses GDASApp at 368c9c5. Copy GDASApp modulefiles/GDAS/hercules.intel.lua
to orion.intel.lua
. Build GDASApp. Run test_gdasapp
. 36 out of 48 test pass.
77% tests passed, 11 tests failed out of 48
Label Time Summary:
gdas-utils = 11.54 sec*proc (11 tests)
script = 11.54 sec*proc (11 tests)
Total Test time (real) = 1321.78 sec
The following tests FAILED:
1843 - test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS (Failed)
1844 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1845 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1846 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1847 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN (Failed)
1848 - test_gdasapp_soca_copy_scratch (Failed)
1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1850 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1851 - test_gdasapp_soca_socahybridweights (Failed)
1852 - test_gdasapp_soca_incr_handler (Failed)
1853 - test_gdasapp_soca_ens_handler (Failed)
All failures except test_gdasapp_soca_copy_scratch
are due to
sbatch: error: invalid partition specified: hercules
sbatch: error: Batch job submission failed: Invalid partition name specified
test/soca/gw/CMakeLists.txt
sets variable MACHINE
via
# Identify machine
set(MACHINE "container")
IF (IS_DIRECTORY /work2)
IF (IS_DIRECTORY /apps/other)
set(MACHINE "hercules")
set(PARTITION "hercules")
ELSE()
set(MACHINE "orion")
set(PARTITION "orion")
ENDIF()
ENDIF()
IF (IS_DIRECTORY /scratch2/NCEPDEV/)
set(MACHINE "hera")
set(PARTITION "hera")
ENDIF()
IF (IS_DIRECTORY /lfs/h2/)
set(MACHINE "wcoss2")
ENDIF()
Directory /apps/other
exists on Orion following the Rocky 9 upgrade. Thus, we wind up with MACHINE
and PARTITION
set to hercules
. I do not know if there remain any directories unique to Orion and Hercules after the Rocky 9 upgrade which we can use to distinguish between the machines.
Test test_gdasapp_soca_copy_scratch
failed due to an expected directory
/work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd/build/gdas/test/soca/gw/testrun/testjjobs/RUNDIRS/gdas_test/gdasocnanal_12/
not being present. This absence of this directory is likely due to failed soca tests prior to this test.
FYI @guillaumevernieres - we need to figure out how to distinguish between Orion and Hercules following the Rocky 9 upgrade.
build.sh
sets BUILD_TARGET
. Use this to set MACHINE
and PARTITION
via the following changes
build.sh
@@ -87,7 +87,7 @@ case ${BUILD_TARGET} in
;;
esac
-CMAKE_OPTS+=" -DCLONE_JCSDADATA=$CLONE_JCSDADATA"
+CMAKE_OPTS+=" -DCLONE_JCSDADATA=$CLONE_JCSDADATA -DMACHINE=$BUILD_TARGET"
BUILD_DIR=${BUILD_DIR:-$dir_root/build}
if [[ $CLEAN_BUILD == 'YES' ]]; then
test/soca/gw/CMakeLists.txt
@@ -10,25 +10,14 @@ add_test(NAME test_gdasapp_soca_prep
ENVIRONMENT "PYTHONPATH=${PROJECT_BINARY_DIR}/ush:${PROJECT_SOURCE_DIR}/../../ush/python/wxflow/src:$ENV{PYTHONPATH}")
# Identify machine
-set(MACHINE "container")
-IF (IS_DIRECTORY /work2)
- IF (IS_DIRECTORY /apps/other)
- set(MACHINE "hercules")
- set(PARTITION "hercules")
- ELSE()
- set(MACHINE "orion")
- set(PARTITION "orion")
- ENDIF()
-ENDIF()
-IF (IS_DIRECTORY /scratch2/NCEPDEV/)
- set(MACHINE "hera")
+if (MACHINE STREQUAL "hercules")
+ set(PARTITION "hercules")
+ELSEIF (MACHINE STREQUAL "orion")
+ set(PARTITION "orion")
+ELSEIF (MACHINE STREQUAL "hera")
set(PARTITION "hera")
ENDIF()
-IF (IS_DIRECTORY /lfs/h2/)
- set(MACHINE "wcoss2")
-ENDIF()
-
# Clean-up
add_test(NAME test_gdasapp_soca_run_clean
COMMAND ${CMAKE_COMMAND} -E remove_directory ${PROJECT_BINARY_DIR}/test/soca/gw/testrun/testjjobs)
Also need to add hack to g-w workflow/hosts.py
. g-w issue #2695 reports a bug in hosts.py
following the Orion Rocky 9 upgrade. The hack forces machine=ORION
when hosts.py
is executed. This hack is required for GDASApp ctests which run g-w jobs.
Build GDASApp inside g-w on Orion with the hosts.py
hack and the above GDASApp local changes in place. Run ctests. 48 out of 48 tests pass.
Test project /work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd/build
Start 1489: test_gdasapp_util_coding_norms
1/48 Test #1489: test_gdasapp_util_coding_norms ........................ Passed 4.56 sec
Start 1490: test_gdasapp_util_ioda_example
2/48 Test #1490: test_gdasapp_util_ioda_example ........................ Passed 10.26 sec
...
Start 1869: test_gdasapp_atm_jjob_ens_final
47/48 Test #1869: test_gdasapp_atm_jjob_ens_final ....................... Passed 42.23 sec
Start 1870: test_gdasapp_aero_gen_3dvar_yaml
48/48 Test #1870: test_gdasapp_aero_gen_3dvar_yaml ...................... Passed 0.51 sec
100% tests passed, 0 tests failed out of 48
Label Time Summary:
gdas-utils = 23.59 sec*proc (11 tests)
script = 23.59 sec*proc (11 tests)
Total Test time (real) = 1622.32 sec
The above changes are in /work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd
modified: build.sh
modified: modulefiles/GDAS/orion.intel.lua
modified: test/soca/gw/CMakeLists.txt
Note that orion.intel.lua
has been updated to specify
prepend_path("MODULEPATH", '/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/modulefiles/Core')
Changes to enable GDASApp to build and run on Orion following the Rocky 9 upgrade will be committed to RussTreadon-NOAA:feature/orion_rocky9.
Received the following from RDHPCS Management
This issue is opened to document the updating of
modulefiles/EVA/orion.lua
andmodulefiles/GDAS/orion.intel.lua
to Rocky 9