Closed RussTreadon-NOAA closed 2 years ago
The merger of the master
into release/gfsda.v16.x
will be done in @RussTreadon-NOAA 's forked copy of release/gfsda.v16.x
. At the start of this work, the authoritative release/gfsda.v16.x
and forked copy are at 3898ab4. The authoritative master is at 2f28fbf.
Comments regarding merge of master
into release/gfsda.v16.x
git merge
of the master
at 2f28fbf into the forked release/gfsda.v16.x
at 3898ab4 yielded
modulefile.ProdGSI.wcoss
, removed from masterAfter resolving the 42 conflicts the working copy of the merged gfsda.v16.x, the authoritative gfsda.v16.x, and the authoritative master were built on Venus. The 2021101900 gdas case was run using each executable via a standalone rungsi script.
The J table at the start of the first outer loop shows the merged gfsda.v16.x produces J terms comparable to the authoritative gfsda.v16.x. Listed below are the total J Global
for each executable for this case
2.1684929235179136E+06
2.1684929235178004E+06
The first 13 digits of J Global are identical. Differences in the last few digits of penalty terms were observed when hpc-stack was merged into the master. The authoritative gfsda.v16.x is not built with hpc-stack. The merged gfsda.v16.x is built with hpc-stack.
The J table at the start of the first outer loop shows the merged gfsda.v16.x produces J terms identical with those from the master except for ozone and radiances.
Haixia explained the differences in the ozone penalty. The authoritative gfsda.v16.x contains additional qc for ompstc8
! Check scan position errors in ompstc8
if(obstype == "ompstc8") then
if(data(ifovn,i) .eq. 1 .or. data(ifovn,i) .eq. 2 .or. &
data(ifovn,i) .eq. 3 .or. data(ifovn,i) .eq. 4 .or. &
data(ifovn,i) .eq. 35) then
if(abs(data(ilate,i)) > 50.)then
luse(i) = .false.
endif
endif
endif
This qc is absent from the master. A check of fort.206 confirms that the merged gfsda.v16.x and master ozone penalties only differ for ompstc8
when using the same global_ozinfo.txt
.
Differences in the radiances are due to different Rcov files being used by the master and merged gfsda.v16.x. The merged gfsda.v16 uses the Rcov files documented in #233. These Rcov files are associated with release/gfsda.v16.1.5 with changes noted in #233. The master Rcov files pre-date release/gfsda.v16.1.5. Comparison of the two fort.207 files shows them to be identical apart from metop-b_iasi and n20_cris-fsr when using the same global_satinfo.txt
.
One item of note is that the authoritative master contains optimizations not present in the authoritative gfsda.v16.x. The merged gfsda.v16.x contains these optimizations. The wall time reduction from the optimizations is sizeable.
3398.008674 seconds
4037.728002 seconds
3353.212421 seconds
Given the above results, the working copy of the merged gfsda.v16.x will be committed to @RussTreadon-NOAA 's forked copy of release/gfsda.v16.x.
Regression testing
Build authoritative master at 2f28fbf and forked release/gfsda.v16.x at 81ff90c on WCOSS_D (Mars). Run standard suite of regression tests with results shown below:
[emc.glopara@m71a3 build]$ ctest -j 19
Test project /gpfs/dell2/emc/modeling/noscrub/emc.glopara/git/gsi/master/build
Start 1: global_T62
Start 2: global_T62_ozonly
Start 3: global_4dvar_T62
Start 4: global_4denvar_T126
Start 5: global_fv3_4denvar_T126
Start 6: global_fv3_4denvar_C192
Start 7: global_lanczos_T62
Start 8: arw_netcdf
Start 9: arw_binary
Start 10: nmm_binary
Start 11: nmm_netcdf
Start 12: nmmb_nems_4denvar
Start 13: hwrf_nmm_d2
Start 14: hwrf_nmm_d3
Start 15: rtma
Start 16: global_enkf_T62
Start 17: netcdf_fv3_regional
Start 18: global_C96_fv3aero
Start 19: global_C96_fv3aerorad
1/19 Test #8: arw_netcdf ....................... Passed 244.28 sec
2/19 Test #2: global_T62_ozonly ................ Passed 364.72 sec
3/19 Test #18: global_C96_fv3aero ............... Passed 366.80 sec
4/19 Test #17: netcdf_fv3_regional .............. Passed 484.43 sec
5/19 Test #11: nmm_netcdf ....................... Passed 484.54 sec
6/19 Test #9: arw_binary ....................... Passed 484.64 sec
7/19 Test #16: global_enkf_T62 .................. Passed 727.60 sec
8/19 Test #13: hwrf_nmm_d2 ...................... Passed 849.94 sec
9/19 Test #10: nmm_binary ....................... Passed 853.87 sec
10/19 Test #14: hwrf_nmm_d3 ...................... Passed 854.93 sec
11/19 Test #3: global_4dvar_T62 ................. Passed 1204.33 sec
12/19 Test #15: rtma ............................. Passed 1451.49 sec
13/19 Test #12: nmmb_nems_4denvar ................ Passed 1477.54 sec
14/19 Test #7: global_lanczos_T62 ............... Passed 1924.02 sec
15/19 Test #4: global_4denvar_T126 .............. Passed 2284.32 sec
16/19 Test #1: global_T62 ....................... Passed 3244.07 sec
17/19 Test #5: global_fv3_4denvar_T126 ..........***Failed 3365.40 sec
18/19 Test #6: global_fv3_4denvar_C192 .......... Passed 3524.82 sec
19/19 Test #19: global_C96_fv3aerorad ............***Failed 4206.32 sec
89% tests passed, 2 tests failed out of 19
Total Test time (real) = 4206.38 sec
The following tests FAILED:
5 - global_fv3_4denvar_T126 (Failed)
19 - global_C96_fv3aerorad (Failed)
Errors while running CTest
Check regression_results.txt
for failed tests.
global_C96_fv3aerorad
failed due to the job wall time exceeding the specified limit of 1200 seconds
The runtime for global_C96_fv3aerorad_loproc_updat is 1277.273419 seconds. This has exceeded maximum allowable operational time of 1200 seconds,
resulting in Failure of max-time in the regression test.
The loproc_contrl
(master) test also had a wall time, 1278.910497 seconds, greater than 1200 seconds. This is not a fatal failure.
global_fv3_4denvar_T126
failed with non-reproducible results between the two global_gsi.x executables
The results between the two runs are nonreproducible,
thus the regression test has Failed on cost for global_fv3_4denvar_T126_loproc_updat and global_fv3_4denvar_T126_loproc_contrl analyses.
The differences are real and explainable.
release/gfsda.v16.x
includes the correlated error changes described in issue #233. This set of changes is not in the authoritative master. Comparison of the contrl
and updat
initial penalties show all penalty terms to be identical for the 17 printed digits except radiances. A diff of the contrl
and updat
fort.207
show differences limited to metop-b iasi
and n20 cris-fsr
- satellite/sensors for which correlated error is applied.
It should be noted that the loproc
and hiproc
runs for the contrl
are reproducible. The same is true for the loproc
and hiproc
runs of the updat
.
Branch
release/gfsda.v16.x
does not contain several important updates in themaster
. This issue is opened to document the merger of the authoritativemaster
into the authoritativerelease/gfsda.v16.x
.