NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
14 stars 28 forks source link

Marine DA test are failing after the rocky 9 upgrade #1178

Closed guillaumevernieres closed 1 week ago

guillaumevernieres commented 2 weeks ago
          As a test, clone g-w `develop` at [5af325a6](https://github.com/NOAA-EMC/global-workflow/commit/5af325a6a4e0a14d180514a418603ca79fada487) on Orion following Rocky 9 upgrade.  This snapshot of g-w `develop` uses GDASApp at 368c9c5.  Copy GDASApp `modulefiles/GDAS/hercules.intel.lua` to `orion.intel.lua`.  Build GDASApp.  Run `test_gdasapp`.  36 out of 48 test pass.
77% tests passed, 11 tests failed out of 48

Label Time Summary:
gdas-utils    =  11.54 sec*proc (11 tests)
script        =  11.54 sec*proc (11 tests)

Total Test time (real) = 1321.78 sec

The following tests FAILED:
        1843 - test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS (Failed)
        1844 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
        1845 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
        1846 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
        1847 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN (Failed)
        1848 - test_gdasapp_soca_copy_scratch (Failed)
        1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
        1850 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
        1851 - test_gdasapp_soca_socahybridweights (Failed)
        1852 - test_gdasapp_soca_incr_handler (Failed)
        1853 - test_gdasapp_soca_ens_handler (Failed)

All failures except test_gdasapp_soca_copy_scratch are due to

sbatch: error: invalid partition specified: hercules
sbatch: error: Batch job submission failed: Invalid partition name specified

test/soca/gw/CMakeLists.txt sets variable MACHINE via

# Identify machine
set(MACHINE "container")
IF (IS_DIRECTORY /work2)
  IF (IS_DIRECTORY /apps/other)
    set(MACHINE "hercules")
    set(PARTITION "hercules")
  ELSE()
    set(MACHINE "orion")
    set(PARTITION "orion")
  ENDIF()
ENDIF()
IF (IS_DIRECTORY /scratch2/NCEPDEV/)
  set(MACHINE "hera")
  set(PARTITION "hera")
ENDIF()

IF (IS_DIRECTORY /lfs/h2/)
   set(MACHINE "wcoss2")
ENDIF()

Directory /apps/other exists on Orion following the Rocky 9 upgrade. Thus, we wind up with MACHINE and PARTITION set to hercules. I do not know if there remain any directories unique to Orion and Hercules after the Rocky 9 upgrade which we can use to distinguish between the machines.

Test test_gdasapp_soca_copy_scratch failed due to an expected directory

/work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd/build/gdas/test/soca/gw/testrun/testjjobs/RUNDIRS/gdas_test/gdasocnanal_12/

not being present. This absence of this directory is likely due to failed soca tests prior to this test.

FYI @guillaumevernieres - we need to figure out how to distinguish between Orion and Hercules following the Rocky 9 upgrade.

Originally posted by @RussTreadon-NOAA in https://github.com/NOAA-EMC/GDASApp/issues/1159#issuecomment-2173861260

RussTreadon-NOAA commented 2 weeks ago

Similar path problem found in g-w workflow/hosts.py. See g-w issue #2695 for details.