Closed danholdaway closed 2 weeks ago
Orion test
Install feature/rename_atm
at 2ec1767 inside g-w devleop
at c44d0ac8
. Initial run of test_gdasapp
yielded the following ctest failures
64% tests passed, 17 tests failed out of 47
Label Time Summary:
gdas-utils = 4.10 sec*proc (9 tests)
script = 4.10 sec*proc (9 tests)
Total Test time (real) = 1370.19 sec
The following tests FAILED:
1756 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1757 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1758 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1761 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1762 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1764 - test_gdasapp_soca_socahybridweights (Failed)
1765 - test_gdasapp_soca_incr_handler (Failed)
1766 - test_gdasapp_soca_ens_handler (Failed)
1775 - test_gdasapp_atm_jjob_var_init (Failed)
1776 - test_gdasapp_atm_jjob_var_run (Failed)
1777 - test_gdasapp_atm_jjob_var_inc (Failed)
1778 - test_gdasapp_atm_jjob_var_final (Failed)
1779 - test_gdasapp_atm_jjob_ens_init (Failed)
1780 - test_gdasapp_atm_jjob_ens_run (Failed)
1781 - test_gdasapp_atm_jjob_ens_inc (Failed)
1782 - test_gdasapp_atm_jjob_ens_final (Failed)
Need to update g-w sorc/jcb
to be consistent with jcb-algorithms and jcb-gdas used in sorc/gdas.cd/parm
. Update working copy of sorc/jcb
to jcb branch feature/rename_atm
at d167c6c
. Rerun test_gdasapp
. More ctests pass. Failures are limited to soca tests
81% tests passed, 9 tests failed out of 47
Label Time Summary:
gdas-utils = 4.66 sec*proc (9 tests)
script = 4.66 sec*proc (9 tests)
Total Test time (real) = 1787.51 sec
The following tests FAILED:
1756 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1757 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1758 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1761 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1762 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1764 - test_gdasapp_soca_socahybridweights (Failed)
1765 - test_gdasapp_soca_incr_handler (Failed)
1766 - test_gdasapp_soca_ens_handler (Failed)
Some of these failures may flip to Passed if the initial test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP
failure is resolved.
Took a peek at /work2/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/build/gdas/test/soca/gw/testrun/JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP.out
.
The traceback flagged gdas.t12z.insitu_surface_trkob.2018041512.nc4
as a missing file. This, however, may not be the cause of the failure. The traceback ends with
File "/work2/noaa/da/rtreadon/git/global-workflow/rename_atm/parm/gdas/jcb-algorithms/3dfgat.yaml.j2", line 45, in top-level template code
{% include observation_from_jcb + '.yaml.j2' %}
File "/work2/noaa/da/python/opt/core/miniconda3/4.6.14/envs/gdasapp/lib/python3.7/site-packages/jinja2/loaders.py", line 218, in get_source
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: insitu_profile_bathy.yaml.j2
+ JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP[1]: postamble JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP 1717527206 1
Thanks for testing @RussTreadon-NOAA. Passing tests require https://github.com/danholdaway/global-workflow/tree/feature/rename_atm branch of global-workflow. You should be able to switch branch and run the tests again without a rebuild since it's just a change to jcb and config that matters.
Ah I see you did that. Perhaps the failure is because of more observations added to obs_list without those YAMLs going to JCB as well. Let me check.
@RussTreadon-NOAA I fixed that failure by adding the insitu YAML files to JCB. There should be zero downstream impact on the other tests, which should all pass again.
Automated Global-Workflow GDASApp Testing Results: Machine: orion
Start: Thu Jun 6 13:20:48 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Thu Jun 6 14:10:32 CDT 2024
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Thu Jun 6 14:25:32 CDT 2024
Tests: 64% tests passed, 17 tests failed out of 47
1842 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1843 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1844 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1847 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1848 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1850 - test_gdasapp_soca_socahybridweights (Failed)
1851 - test_gdasapp_soca_incr_handler (Failed)
1852 - test_gdasapp_soca_ens_handler (Failed)
1861 - test_gdasapp_atm_jjob_var_init (Failed)
1862 - test_gdasapp_atm_jjob_var_run (Failed)
1863 - test_gdasapp_atm_jjob_var_inc (Failed)
1864 - test_gdasapp_atm_jjob_var_final (Failed)
1865 - test_gdasapp_atm_jjob_ens_init (Failed)
1866 - test_gdasapp_atm_jjob_ens_run (Failed)
1867 - test_gdasapp_atm_jjob_ens_inc (Failed)
1868 - test_gdasapp_atm_jjob_ens_final (Failed)
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1144/global-workflow/sorc/gdas.cd/build/log.ctest
Updated branches and submodules in /work2/noaa/da/rtreadon/git/global-workflow/rename_atm
. (Note: I am working in a locally modified copy of g-w develop
.)
45 out of 47 tests pass
96% tests passed, 2 tests failed out of 47
Label Time Summary:
gdas-utils = 9.85 sec*proc (9 tests)
script = 9.85 sec*proc (9 tests)
Total Test time (real) = 1953.63 sec
The following tests FAILED:
1836 - test_gdasapp_fv3jedi_fv3inc (Failed)
1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
A rerun of test_gdasapp_fv3jedi_fv3inc
passed
(gdasapp) Orion-login-2:/work2/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/build$ ctest -R test_gdasapp_fv3jedi_fv3inc
Test project /work2/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/build
Start 1836: test_gdasapp_fv3jedi_fv3inc
1/1 Test #1836: test_gdasapp_fv3jedi_fv3inc ...... Passed 5.43 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 5.98 sec
I can not explain why the first run failed. The test includes a reference check. Is it possible that fv3jedi_fv3inc test results are not bitwise identical from one run to the next? What do you think @DavidNew-NOAA?
A check of the log file for test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY
shows
2024-06-06 20:49:14:INFO:Loading input YAML from preevayamls/eva_insitu_profile_tesac_salinity_2018041512.yaml
Traceback (most recent call last):
File "/work2/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/scripts/exgdas_global_marine_analysis_vrfy.py", line 187, in <module>
marine_eva_post.marine_eva_post(infile, 'evayamls', diagdir)
File "/work2/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/ush/eva/marine_eva_post.py", line 39, in marine_eva_post
layer['vmin'] = vminmax[variable]['vmin']
KeyError: 'salinity'
+ JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY[1]: postamble JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY 1717706755 1
+ preamble.sh[70]: set +x
End JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY at 20:49:15 with error code 1 (time elapsed: 00:03:20)
+ Unknown[1]: postamble slurm_script 1717706753 1
My working copy of feature/rename_atm
may not be consistent with the current state of g-w develop
and/or other repositories.
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Thu Jun 6 18:24:35 UTC 2024 on hfe06
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Thu Jun 6 19:11:22 UTC 2024
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Thu Jun 6 21:10:13 UTC 2024
Tests: 64% tests passed, 17 tests failed out of 47
1841 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1842 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1843 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1846 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1847 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1848 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1849 - test_gdasapp_soca_socahybridweights (Failed)
1850 - test_gdasapp_soca_incr_handler (Failed)
1851 - test_gdasapp_soca_ens_handler (Failed)
1860 - test_gdasapp_atm_jjob_var_init (Failed)
1861 - test_gdasapp_atm_jjob_var_run (Failed)
1862 - test_gdasapp_atm_jjob_var_inc (Failed)
1863 - test_gdasapp_atm_jjob_var_final (Failed)
1864 - test_gdasapp_atm_jjob_ens_init (Failed)
1867 - test_gdasapp_atm_jjob_ens_final (Failed)
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1144/global-workflow/sorc/gdas.cd/build/log.ctest
@RussTreadon-NOAA There shouldn't be a reference check of the input/output state for the fv3inc tests. However, there were some significant updates to Global Workflow on Wednesday, and I was getting some similar errors yesterday until I updates all my repos
@RussTreadon-NOAA Sorry, I take that back. There is a reference check for test_gdasapp_fv3jedi_fv3inc
. I was thinking of test_gdasapp_atm_jjob_var_inc
and test_gdasapp_atm_jjob_ens_inc
. Eventually I wish to add reference checks for those two jobs which will make test_gdasapp_fv3jedi_fv3inc
somewhat redundant. Anyways, I'm not sure about whether things are bitwise identical from one run to the next. I would hope they are on the same machine.
Orion tests
Install forked g-w feature/rename_atm
at 09ec021
on Orion. This fork clones GDASApp feature/rename_atm
into g-w sorc/gdas.cd
. Update cloned sorc/gdas.cd
to current head, 72beb13, of GDASApp feature/rename_atm
.
Run GDASApp ctests. 47 out of 47 tests pass
Test project /work2/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/build
Start 1489: test_gdasapp_util_coding_norms
1/47 Test #1489: test_gdasapp_util_coding_norms ........................ Passed 8.84 sec
...
Start 1869: test_gdasapp_aero_gen_3dvar_yaml
47/47 Test #1869: test_gdasapp_aero_gen_3dvar_yaml ...................... Passed 0.26 sec
100% tests passed, 0 tests failed out of 47
Label Time Summary:
gdas-utils = 16.49 sec*proc (9 tests)
script = 16.49 sec*proc (9 tests)
Total Test time (real) = 1521.82 sec
Run g-w CI C96C48_ufs_hybatmDA. The following jobs failed
202402240000 gdasatmanlvar 18266389 DEAD 1 2 102.0
202402240000 gfsatmanlvar 18266345 DEAD 1 2 66.0
due to
6: GSI grid: number of processor in layout does not match number in communicator
12: GSI grid: number of processor in layout does not match number in communicator
14: GSI grid: number of processor in layout does not match number in communicator
18: GSI grid: number of processor in layout does not match number in communicator
Examination of the input yaml found
saber central block:
saber block name: gsi static covariance
read:
gsi akbk: ./fv3jedi/akbk.nc4
gsi error covariance file: /work/noaa/stmp/rtreadon/ORION/RUNDIRS/prename/gfsatmanl_00/berror/gsi-coeffs-gfs-global.nc4
gsi berror namelist file: /work/noaa/stmp/rtreadon/ORION/RUNDIRS/prename/gfsatmanl_00/berror/gfs_gsi_global.nml
processor layout x direction: 12
processor layout y direction: 8
debugging mode: false
saber outer blocks:
- saber block name: gsi interpolation to model grid
gsi akbk: ./fv3jedi/akbk.nc4
gsi error covariance file: /work/noaa/stmp/rtreadon/ORION/RUNDIRS/prename/gfsatmanl_00/berror/gsi-coeffs-gfs-global.nc4
gsi berror namelist file: /work/noaa/stmp/rtreadon/ORION/RUNDIRS/prename/gfsatmanl_00/berror/gfs_gsi_global.nml
processor layout x direction: 12
processor layout y direction: 12
debugging mode: false
The [12,8]
layout is correct. The [12,12]
layout is wrong. Trace this to a typo in parm/jcb-gdas/model/atmosphere/atmosphere_background_error_hybrid_gsibec_bump.yaml.j2
gsi berror namelist file: {{atmosphere_gsibec_path}}/gfs_gsi_global.nml
processor layout x direction: {{atmosphere_layout_gsib_x}}
- processor layout y direction: {{atmosphere_layout_gsib_x}}
+ processor layout y direction: {{atmosphere_layout_gsib_y}}
Correct typo in working copy. GDASApp-base DA jobs successfully ran to completion. All jobs now complete
(gdasapp) Orion-login-4:/work2/noaa/stmp/rtreadon/EXPDIR/prename$ rocotostat -d prename.db -w prename.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202402231800 Done Jun 10 2024 14:40:15 Jun 10 2024 15:00:30
202402240000 Done Jun 10 2024 14:40:15 Jun 10 2024 18:01:42
Thanks @RussTreadon-NOAA, feel free to push the required changes to the PRs
Need to update parm/jcb-gdas
hash to a5d0277
. Should also bring feature/rename_atm
up to date with current head of GDASApp develop
.
Automated GDASApp Testing Results: Machine: hera
Start: Tue Jun 18 19:46:12 UTC 2024 on hfe03
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Tue Jun 18 20:40:01 UTC 2024
---------------------------------------------------
Tests: *SUCCESS*
Tests: Completed at Tue Jun 18 20:41:58 UTC 2024
Tests: 100% tests passed, 0 tests failed out of 24
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Tue Jun 18 19:52:55 UTC 2024 on hfe03
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Tue Jun 18 20:47:51 UTC 2024
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Tue Jun 18 21:09:36 UTC 2024
Tests: 67% tests passed, 16 tests failed out of 48
1843 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1844 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1845 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1848 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1850 - test_gdasapp_soca_socahybridweights (Failed)
1851 - test_gdasapp_soca_incr_handler (Failed)
1852 - test_gdasapp_soca_ens_handler (Failed)
1861 - test_gdasapp_atm_jjob_var_init (Failed)
1862 - test_gdasapp_atm_jjob_var_run (Failed)
1863 - test_gdasapp_atm_jjob_var_inc (Failed)
1864 - test_gdasapp_atm_jjob_var_final (Failed)
1865 - test_gdasapp_atm_jjob_ens_init (Failed)
1866 - test_gdasapp_atm_jjob_ens_run (Failed)
1867 - test_gdasapp_atm_jjob_ens_inc (Failed)
1868 - test_gdasapp_atm_jjob_ens_final (Failed)
Tests: see output at /scratch1/NCEPDEV/da/role.jedipara/CI/GDASApp/workflow/PR/1144/global-workflow/sorc/gdas.cd/build/log.ctest
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Tue Jun 18 22:05:37 UTC 2024 on hfe04
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Tue Jun 18 22:48:47 UTC 2024
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Tue Jun 18 23:09:55 UTC 2024
Tests: 67% tests passed, 16 tests failed out of 48
1843 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1844 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1845 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1848 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1850 - test_gdasapp_soca_socahybridweights (Failed)
1851 - test_gdasapp_soca_incr_handler (Failed)
1852 - test_gdasapp_soca_ens_handler (Failed)
1861 - test_gdasapp_atm_jjob_var_init (Failed)
1862 - test_gdasapp_atm_jjob_var_run (Failed)
1863 - test_gdasapp_atm_jjob_var_inc (Failed)
1864 - test_gdasapp_atm_jjob_var_final (Failed)
1865 - test_gdasapp_atm_jjob_ens_init (Failed)
1866 - test_gdasapp_atm_jjob_ens_run (Failed)
1867 - test_gdasapp_atm_jjob_ens_inc (Failed)
1868 - test_gdasapp_atm_jjob_ens_final (Failed)
Tests: see output at /scratch1/NCEPDEV/da/role.jedipara/CI/GDASApp/workflow/PR/1144/global-workflow/sorc/gdas.cd/build/log.ctest
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Wed Jun 19 00:38:56 UTC 2024 on hfe04
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Wed Jun 19 01:24:03 UTC 2024
---------------------------------------------------
Tests: *SUCCESS*
Tests: Completed at Wed Jun 19 01:49:26 UTC 2024
Tests: 100% tests passed, 0 tests failed out of 48
@CoryMartin-NOAA, @guillaumevernieres, and @DavidNew-NOAA - the changes in this PR have been tested via g-w PR #2700 with acceptable results. I'd like to merge this PR in GDASApp develop
. Any objections?
@danholdaway has three JCB PRs related to GDASApp PR #1144 (this PR):
Each of the jcb PRs have been approved by Cory, David, and Russ. These jcb PRs should also be merged into their respective develop
. Are we OK with doing so?
@RussTreadon-NOAA No objection here
Once approved I will merge the JCB PRs and then we can merge this after updating the hashes.
A G-W PR will follow the approval and merge of this PR.
G-W branch required for testing: https://github.com/danholdaway/global-workflow/tree/feature/rename_atm