NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Update GDASapp hash to move JCB into GDASapp #2665

Closed danholdaway closed 2 weeks ago

danholdaway commented 3 weeks ago

Description

Having JCB in global-workflow results in situations where the CI cannot satisfactorily test the code. This PR moves JCB into GDASapp. The PR also bumps up the hash of GDASapp to what is in feature/move_jcb, which at time of writing is develop plus the absorption of JCB into GDASapp.

Once ready for merge I would suggest we merge the upstream changes into develop so a final hash of GDASapp can point to develop.

Note that I also took the changes from https://github.com/NOAA-EMC/global-workflow/pull/2641 to follow the testing @RussTreadon-NOAA has done.

Type of change

Change characteristics

How has this been tested?

Hera GDASapp Ctests

Checklist

RussTreadon-NOAA commented 3 weeks ago

g-w develop currently points at gsi_utils.fd @ d940406. We should stick with this GSI-utils hash or jump to the current head of GSI-utils, 4332814.

gsi_utils.fd @ bb03e17 is two commits behind gsi_utils.fd @ d940406

RussTreadon-NOAA commented 3 weeks ago

Orion test

Install danholdaway:feature/move_jcb at 4d3150ca on Orion. Build apps and run GDASApp ctests. All tests pass except one

98% tests passed, 1 tests failed out of 47

Label Time Summary:
gdas-utils    =  15.66 sec*proc (9 tests)
script        =  15.66 sec*proc (9 tests)

Total Test time (real) = 4102.87 sec

The following tests FAILED:
        1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)

This failure also occurs when using GDASApp develop and has been reported via GDASApp issue #1153. GDASApp PR #1157 removed, for the time being, this test from the list of active GDASApp ctests.

Enable C96C48_ufs_hybatmDA CI on Orion. Run CI for this set up. All jobs successfully run to completion. All DA jobs successfully ran to completion. Only the gfsfcst and downstream jobs remain to be run.

RussTreadon-NOAA commented 3 weeks ago

Just closing the loop on Orion CI tests. All jobs for C96C48_ufs_hybatmDA CI are now complete and successful

Orion-login-2:/work2/noaa/stmp/rtreadon/EXPDIR/prmove$ rocotostat -d prmove.db -w prmove.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Jun 07 2024 13:56:28    Jun 07 2024 14:20:39
202402240000        Done    Jun 07 2024 13:56:28    Jun 07 2024 19:24:25
WalterKolczynski-NOAA commented 2 weeks ago

@danholdaway The gdas conflict after Russ's PR got merged needs to be resolved.

danholdaway commented 2 weeks ago

@WalterKolczynski-NOAA all up to date.

WalterKolczynski-NOAA commented 2 weeks ago

@danholdaway thanks

emcbot commented 2 weeks ago

Experiment C96_atm3DVar FAILED on Hera in /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/C96_atm3DVar_eebeb4b2

emcbot commented 2 weeks ago

Experiment C96C48_hybatmDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/C96C48_hybatmDA_eebeb4b2

emcbot commented 2 weeks ago

Experiment C96_atmaerosnowDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/C96_atmaerosnowDA_eebeb4b2

emcbot commented 2 weeks ago

Experiment C48_S2SWA_gefs FAILED on Hera in /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/C48_S2SWA_gefs_eebeb4b2

emcbot commented 2 weeks ago

Experiment C48_ATM FAILED on Hera in /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/C48_ATM_eebeb4b2

emcbot commented 2 weeks ago

Experiment C48_S2SW FAILED on Hera in /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/C48_S2SW_eebeb4b2

emcbot commented 2 weeks ago

Experiment C48mx500_3DVarAOWCDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/C48mx500_3DVarAOWCDA_eebeb4b2

DavidHuber-NOAA commented 2 weeks ago

Manually checking, all tests passed on Hera. I think the Java controller crashed at some point.

TerrenceMcGuinness-NOAA commented 2 weeks ago

@DavidHuber-NOAA Yes I noticed one view had all of the experiments running in the pipeline but the controller had given up because it came up and down. I was just confirming they all passed too.

Confirmation:

Terry.McGuinness (hfe09) utils (develop) $ ~/bin/check_expdir.sh /scratch1/NCEPDEV/global/CI/2665/RUNTESTS/EXPDIR
C48_ATM_4de31999
DONE
C48_S2SWA_gefs_4de31999
DONE
C48_S2SW_4de31999
DONE
C48mx500_3DVarAOWCDA_4de31999
DONE
C96C48_hybatmDA_4de31999
DONE
C96_atm3DVar_4de31999
DONE
C96_atmaerosnowDA_4de31999
DONE