Closed Junjun-NOAA closed 1 month ago
mpas-jedi test:
90% tests passed, 6 tests failed out of 59
Label Time Summary: executable = 92.81 secproc (13 tests) mpasjedi = 712.40 secproc (59 tests) mpi = 705.89 secproc (58 tests) script = 619.59 secproc (46 tests)
Total Test time (real) = 112.89 sec
The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
fv3-jedi test:
92% tests passed, 10 tests failed out of 127
Label Time Summary: fv3-jedi = 1025.78 secproc (126 tests) fv3jedi = 1032.39 secproc (127 tests) mpi = 1020.63 secproc (115 tests) script = 1032.39 secproc (127 tests)
Total Test time (real) = 155.42 sec
The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed) 125 - fv3jedi_test_tier1_eda_3dvar_control_pert (Failed)
NOTE: The following fix files were added per the need of this PR and corresponding links under fix/ were updated.
fv3-jedi-data_2085be5_20241008
ioda-data_20241011
mpas-jedi-data/testinput_tier_1/obs
VEGPARM.TBL.20241011
NoahmpTable.TBL
Fix file changes were sync'ed on Jet/Hera/Orion/Hercules and archived to HPSS
mpas-jedi test:
90% tests passed, 6 tests failed out of 59
Label Time Summary: executable = 92.81 sec_proc (13 tests) mpasjedi = 712.40 sec_proc (59 tests) mpi = 705.89 sec_proc (58 tests) script = 619.59 sec_proc (46 tests)
Total Test time (real) = 112.89 sec
The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
@Junjun-NOAA For you failed mpas-jedi tests, I guess it may be related to the CRTM source code version. Could you try a test by update sorc/crtm
to mpasjedi.v3.0.1 in a different local copy and rerun mpas-jedi tests? If it passed, post your results and run directory here. Thanks!
fv3-jedi test:
92% tests passed, 10 tests failed out of 127
Label Time Summary: fv3-jedi = 1025.78 sec_proc (126 tests) fv3jedi = 1032.39 sec_proc (127 tests) mpi = 1020.63 sec_proc (115 tests) script = 1032.39 sec_proc (127 tests)
Total Test time (real) = 155.42 sec
The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed) 125 - fv3jedi_test_tier1_eda_3dvar_control_pert (Failed)
@Junjun-NOAA For the failed fv3-jedi cases, you can try the following steps:
1) update all the remaining submodules to match jedi-bundle
2) rerun the ctests using my latest commit which updated fv3-jedi-data
If 2) does not solve all fails, then
3) update the sorc/crtm module and see whether it helps.
Thanks!
rrfs-test
all passed on Hera
Test project /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/rrfs-test
Start 1: rrfs_fv3jedi_hyb_2022052619
Start 2: rrfs_fv3jedi_letkf_2022052619
Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
Start 4: rrfs_mpasjedi_2024052700_getkf_observer
Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 ............. Passed 43.19 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ............... Passed 103.69 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc .......... Passed 122.99 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ... Passed 344.80 sec
Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ......... Passed 416.29 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver ..... Passed 1289.32 sec
100% tests passed, 0 tests failed out of 6
Label Time Summary:
mpi = 2320.29 sec*proc (6 tests)
rdas-bundle = 2320.29 sec*proc (6 tests)
script = 2320.29 sec*proc (6 tests)
mpas-jedi test:
90% tests passed, 6 tests failed out of 59
Label Time Summary: executable = 92.81 sec_proc (13 tests) mpasjedi = 712.40 sec_proc (59 tests) mpi = 705.89 sec_proc (58 tests) script = 619.59 sec_proc (46 tests)
Total Test time (real) = 112.89 sec
The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
The updates include: ioda c7b8760fc -> d49ed17e ufo 92ccfb2a -> 94d50d64d oops 35820130 -> d77217323 vader e3457cba4 -> 6d56a1eb5b mpas 3ecd59e2c -> 41e9a3fb8 mpas-jedi a1c609973 -> b9d596d7c9
@Junjun-NOAA To clarify, the fv3-jedi submodule was updated to match the latest jedi-bundle instead of mpasjedi.v3.0.1.
If will be good if you can make a list of all updated submodules in your issue #193 , describing the old commit and new commit (if the repo name is changed, mark that information as well). Thanks!
To clarify: crtm/fix_REL-3.1.1.2
was added/sync'd under RDAS_DATA
for facilitate offline mpasjedi ctests (i.e. test mpasjedi using the CRTMv3 source code but we don't update the crtm submodule in current RDASApp until everyone (especially @xyzemc and @HaidaoLin-NOAA) is ready this upgrade.
For the mpasjedi
tests, be sure to use ctest
instead ctest -j8
as the mpasjedi testing dependencies may not be set up.
For the tests conducted by me (on Hera) and @Junjun-NOAA (on Jet), all failed tests are related to the outputs don't match the reference data. This was caused by different CRTM versions used by RDASApp and MPASJEDI.
FAILED on hercules
started build_and_test on hercules at UTC time: Sun Oct 13 20:05:56 UTC 2024 finished at UTC time: Sun Oct 13 21:11:35 UTC 2024
Test project /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
Start 4: rrfs_mpasjedi_2024052700_getkf_observer
Start 1: rrfs_fv3jedi_hyb_2022052619
Start 2: rrfs_fv3jedi_letkf_2022052619
Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............***Failed 140.02 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............***Failed 158.67 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc .......... Passed 213.51 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ... Passed 744.54 sec
Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ......... Passed 1378.97 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver ..... Passed 1373.57 sec
67% tests passed, 2 tests failed out of 6
Label Time Summary:
mpi = 4009.27 sec*proc (6 tests)
rdas-bundle = 4009.27 sec*proc (6 tests)
script = 4009.27 sec*proc (6 tests)
Total Test time (real) = 2118.11 sec
The following tests FAILED:
1 - rrfs_fv3jedi_hyb_2022052619 (Failed)
2 - rrfs_fv3jedi_letkf_2022052619 (Failed)
Errors while running CTest
Output from these tests are in: /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
workdir: /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194
fv3jedi tests on Hera:
No errors, all failed tests are due to what(): Test reference mismatch
which may be related to the CRTM mismatch in RDASApp and in latest jedi-bundle.
94% tests passed, 7 tests failed out of 127
Label Time Summary:
fv3-jedi = 876.53 sec*proc (126 tests)
fv3jedi = 880.59 sec*proc (127 tests)
mpi = 869.16 sec*proc (115 tests)
script = 880.59 sec*proc (127 tests)
Total Test time (real) = 881.01 sec
The following tests FAILED:
70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed)
88 - fv3jedi_test_tier1_hyb-3dvar (Failed)
91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed)
96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed)
98 - fv3jedi_test_tier1_4denvar (Failed)
99 - fv3jedi_test_tier1_4denvar_seq (Failed)
111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/fv3-jedi/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
mpasjedi tests on Hera:
No errors, all failed tests are due to what(): Test reference mismatch
which may be related to the CRTM mismatch in RDASApp and in latest jedi-bundle.
88% tests passed, 7 tests failed out of 59
Label Time Summary:
executable = 138.51 sec*proc (13 tests)
mpasjedi = 690.62 sec*proc (59 tests)
mpi = 688.64 sec*proc (58 tests)
script = 552.11 sec*proc (46 tests)
Total Test time (real) = 690.85 sec
The following tests FAILED:
37 - test_mpasjedi_3denvar_amsua_allsky (Failed)
38 - test_mpasjedi_3denvar_amsua_bc (Failed)
40 - test_mpasjedi_3dfgat (Failed)
43 - test_mpasjedi_4denvar_VarBC (Failed)
44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed)
47 - test_mpasjedi_4dfgat (Failed)
54 - test_mpasjedi_lgetkf_height_vloc (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/mpas-jedi/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
FAILED on jet
started build_and_test on jet at UTC time: Sun Oct 13 22:11:13 UTC 2024 finished at UTC time: Sun Oct 13 23:15:32 UTC 2024
Test project /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
Start 4: rrfs_mpasjedi_2024052700_getkf_observer
Start 1: rrfs_fv3jedi_hyb_2022052619
Start 2: rrfs_fv3jedi_letkf_2022052619
Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............***Failed 71.72 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............***Failed 71.74 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc .......... Passed 232.28 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ... Passed 481.82 sec
Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ......... Passed 649.56 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver ..... Passed 1759.14 sec
67% tests passed, 2 tests failed out of 6
Label Time Summary:
mpi = 3266.27 sec*proc (6 tests)
rdas-bundle = 3266.27 sec*proc (6 tests)
script = 3266.27 sec*proc (6 tests)
Total Test time (real) = 2241.01 sec
The following tests FAILED:
1 - rrfs_fv3jedi_hyb_2022052619 (Failed)
2 - rrfs_fv3jedi_letkf_2022052619 (Failed)
Errors while running CTest
Output from these tests are in: /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
workdir: /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194
I checked the Failed CI rrfs-tests on Jet and Hercules, they are all due to the following errors:
FATAL from PE 0: NetCDF: Variable not found: get_variable_num_dimension: file:Data/bkg/fv3_dynvars.nc variable: DELP
When I ncdump Data/bkg/fv3_dynvars.nc
, I can only find delp
instead of DELP
.
@TingLei-NOAA @SamuelDegelia-NOAA Do you have any ideas on this? Is there a recent update on fv3jedi changing delp
to DELP
?
The above two CI tests used the new fv3-jedi and fv3-jedi-lm sub modules. At the same time, my rrfs-tests all passed in Hera as I did not update my fv3-jedi and fv3-jedi-lm submodules. I just reverted Junjun's updates on the above two submodules. I believe the latest code should pass CI tests on Hercules and Jet.
Once this PR is reviewed and approved, we can do a final round of CI tests on Hera (and on Hercules and Jet again if needed).
It is good that the latest commit of this PR does not break rrfs-fv3-tests. We will need more fv3jedi experts on helping with the fv3jedi submodule updates, potentially in another PR.
I checked the Failed CI rrfs-tests on Jet and Hercules, they are all due to the following errors:
FATAL from PE 0: NetCDF: Variable not found: get_variable_num_dimension: file:Data/bkg/fv3_dynvars.nc variable: DELP
When I
ncdump Data/bkg/fv3_dynvars.nc
, I can only finddelp
instead ofDELP
. @TingLei-NOAA @SamuelDegelia-NOAA Do you have any ideas on this? Is there a recent update on fv3jedi changingdelp
toDELP
?The above two CI tests used the new fv3-jedi and fv3-jedi-lm sub modules. At the same time, my rrfs-tests all passed in Hera as I did not update my fv3-jedi and fv3-jedi-lm submodules. I just reverted Junjun's updates on the above two submodules. I believe the latest code should pass CI tests on Hercules and Jet. Once this PR is reviewed and approved, we can do a final round of CI tests on Hera (and on Hercules and Jet again if needed).
It is good that this latest commit of this PR does not break rrfs-fv3-tests. We will need more fv3jedi experts on helping with the fv3jedi submodule updates, and potentially in another PR.
@guoqing-noaa it is some change of the corresponding names of delp in the background to DELP in the new fv3-jedi . See https://github.com/JCSDA-internal/fv3-jedi/pull/1251#issuecomment-2387219801, which should be controlled by the "field metadata override" . Since your goal in this PR is to match mpasjedi-v3.0.1. I would suggest let fv3 parts unchanged for being now.
@guoqing-noaa it is some change of the corresponding names of delp in the background to DELP in the new fv3-jedi . See JCSDA-internal/fv3-jedi#1251 (comment), which should be controlled by the "field metadata override" . Since your goal in this PR is to match mpasjedi-v3.0.1. I would suggest let fv3 parts unchanged for being now.
@TingLei-NOAA Thanks a lot for your quick reply and great information! I agree with you that this PR will focus on updating the mpasjedi components (while not breaking rrfs-fv3 tests in current RDASApp).
PASSED on jet
started build_and_test on jet at UTC time: Sun Oct 13 23:57:45 UTC 2024 finished at UTC time: Mon Oct 14 01:07:48 UTC 2024
Test project /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
Start 4: rrfs_mpasjedi_2024052700_getkf_observer
Start 1: rrfs_fv3jedi_hyb_2022052619
Start 2: rrfs_fv3jedi_letkf_2022052619
Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 ............. Passed 60.78 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ............... Passed 124.18 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc .......... Passed 188.37 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ... Passed 612.82 sec
Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ......... Passed 744.72 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver ..... Passed 1709.23 sec
100% tests passed, 0 tests failed out of 6
Label Time Summary:
mpi = 3440.11 sec*proc (6 tests)
rdas-bundle = 3440.11 sec*proc (6 tests)
script = 3440.11 sec*proc (6 tests)
Total Test time (real) = 2322.08 sec
workdir: /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194
PASSED on hercules
started build_and_test on hercules at UTC time: Sun Oct 13 23:56:13 UTC 2024 finished at UTC time: Mon Oct 14 01:08:58 UTC 2024
Test project /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
Start 4: rrfs_mpasjedi_2024052700_getkf_observer
Start 1: rrfs_fv3jedi_hyb_2022052619
Start 2: rrfs_fv3jedi_letkf_2022052619
Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 ............. Passed 687.05 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ............... Passed 755.45 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc .......... Passed 785.31 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ... Passed 1287.82 sec
Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ......... Passed 1838.96 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver ..... Passed 1261.44 sec
100% tests passed, 0 tests failed out of 6
Label Time Summary:
mpi = 6616.04 sec*proc (6 tests)
rdas-bundle = 6616.04 sec*proc (6 tests)
script = 6616.04 sec*proc (6 tests)
Total Test time (real) = 2549.27 sec
workdir: /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194
@ShunLiu-NOAA @hu5970 @TingLei-NOAA @SamuelDegelia-NOAA @delippi
This PR is now ready for review! Junjun and I tested the lated commit on Jet and Hera respectively. All rrfs-tests passed!
For mpasjedi and fv3jedi tests, there are NO errors in running to the finish line. But a small set of mpasjedi/fv3jedi tests failed as their output is different from the reference files (the log files from Hera were posted in the above posts). This is due to the CRTM source code mismatch between RDASApp and the latest mpas-bundle/jedi-bundle. We are not ready to update the CRTM source code yet per previous communication.
Also, per @TingLei-NOAA, due to the recent variable name change (delp
to DELP
) in the fv3 background, it is preferred NOT to update fv3jedi submodules in RDASApp.
PASSED on hera
started build_and_test on hera at UTC time: Mon Oct 14 01:14:52 UTC 2024 finished at UTC time: Mon Oct 14 02:12:44 UTC 2024
Test project /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194/build/rrfs-test
Start 4: rrfs_mpasjedi_2024052700_getkf_observer
Start 1: rrfs_fv3jedi_hyb_2022052619
Start 2: rrfs_fv3jedi_letkf_2022052619
Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 ............. Passed 40.64 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ............... Passed 102.40 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc .......... Passed 126.30 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ... Passed 366.29 sec
Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ......... Passed 425.77 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver ..... Passed 1225.62 sec
100% tests passed, 0 tests failed out of 6
Label Time Summary:
mpi = 2287.02 sec*proc (6 tests)
rdas-bundle = 2287.02 sec*proc (6 tests)
script = 2287.02 sec*proc (6 tests)
Total Test time (real) = 1592.39 sec
workdir: /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194
One item to add: the MPASJEDI 2024052700 case (Ens3Dvar, letkf and getkf) run successfully. Tested on Jet.
Thanks @Junjun-NOAA for working to update these submodules. One thing that might be helpful is to keep a log (maybe via an issue) of all the tests that fail due to CRTM versions. That way when we update again in the future and run mpasjedi tests, we will know which ones we expect to continue failing.
To clarify:
crtm/fix_REL-3.1.1.2
was added/sync'd underRDAS_DATA
for facilitate offline mpasjedi ctests (i.e. test mpasjedi using the CRTMv3 source code but we don't update the crtm submodule in current RDASApp until everyone (especially @xyzemc and @HaidaoLin-NOAA) is ready this upgrade.
It is a little confused to update the CRTMv3 source code in RDASApp before we test if it works correctly.
To clarify:
crtm/fix_REL-3.1.1.2
was added/sync'd underRDAS_DATA
for facilitate offline mpasjedi ctests (i.e. test mpasjedi using the CRTMv3 source code but we don't update the crtm submodule in current RDASApp until everyone (especially @xyzemc and @HaidaoLin-NOAA) is ready this upgrade.It is a little confused to update the CRTMv3 source code in RDASApp before we test if it works correctly.
@xyzemc We did not update CRTMv3 source code in this PR. We only staged those coefficient files under the fix/ directory.
It does not affect anything.
Thanks @Junjun-NOAA for working to update these submodules. One thing that might be helpful is to keep a log (maybe via an issue) of all the tests that fail due to CRTM versions. That way when we update again in the future and run mpasjedi tests, we will know which ones we expect to continue failing.
@SamuelDegelia-NOAA The failed tests should be listed in the above posts. All those failed tests are due to the CRTM versions.
Thanks @Junjun-NOAA for working to update these submodules. One thing that might be helpful is to keep a log (maybe via an issue) of all the tests that fail due to CRTM versions. That way when we update again in the future and run mpasjedi tests, we will know which ones we expect to continue failing.
@SamuelDegelia-NOAA The failed tests should be listed in the above posts. All those failed tests are due to the CRTM versions.
That works, I can just bookmark that page to remember which tests we expect to fail.
Also, do you happen to know why the test_mpasjedi_3dfgat
and test_mpasjedi_4dfgat
tests are failing now? Those were not failing when we tested in #158.
@xyzemc @SamuelDegelia-NOAA I just made a new commit and hence all CRTMv3 things are totally excluded from this PR.
Also, do you happen to know why the
test_mpasjedi_3dfgat
andtest_mpasjedi_4dfgat
tests are failing now? Those were not failing when we tested in #158.
They are also due to output does not match reference. I don't know exact reason. It might be the CRTM versions or the obs data change.
@SamuelDegelia-NOAA Here is the mpas-jedi test log I ran on Hera:
/scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/mpas-jedi/Testing/Temporary/LastTest.log
Feel free to check it or you can build your own RDASApp and test it. Thanks!
The 3dfgat test fails due to a mismatch of the following line:
Test Line: 'CostJo : Nonlinear Jo(Radiosonde) = 9.2179903264942527e+02, nobs = 968, Jo/n = 9.5227172794362114e-01, err = 1.9868525944362956e+00'
Ref Line : 'CostJo : Nonlinear Jo(Radiosonde) = 9.2629445732366753e+02, nobs = 969, Jo/n = 9.5592823253216463e-01, err = 1.9863363417264606e+00'
That makes it sound like not necessarily a CRTM issue. I checked and the sondes_obs_2018041500_m.nc4
doesn't seemed to have changed in this update. Is CRTM the only version difference now between RDASApp and the mpasjedi bundle?
The 3dfgat test fails due to a mismatch of the following line:
Test Line: 'CostJo : Nonlinear Jo(Radiosonde) = 9.2179903264942527e+02, nobs = 968, Jo/n = 9.5227172794362114e-01, err = 1.9868525944362956e+00' Ref Line : 'CostJo : Nonlinear Jo(Radiosonde) = 9.2629445732366753e+02, nobs = 969, Jo/n = 9.5592823253216463e-01, err = 1.9863363417264606e+00'
That makes it sound like not necessarily a CRTM issue. I checked and the
sondes_obs_2018041500_m.nc4
doesn't seemed to have changed in this update. Is CRTM the only version difference now between RDASApp and the mpasjedi bundle?
@SamuelDegelia-NOAA Thanks for further checking this. Does this test assimilate radiance data as well?
@Junjun-NOAA Could you clone/build mpas-bundle v3.0.1 on Hera and check whether it can pass all of its own ctests? Thanks!
@SamuelDegelia-NOAA Thanks for further checking this. Does this test assimilate radiance data as well?
The 3dfgat test also assimilates GNSSRO refractivity obs but that uses a different obs operator (not CRTM).
@SamuelDegelia-NOAA @guoqing-noaa Thanks for the discussion about these failed mpastests. Actually I made a separate of RDASApp on Jet and made ctest yesterday. The test_mpasjedi_4dfgat_append_obs passed but the other five still failed, listed below: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
I checked test_mpasjedi_lgetkf_height_vloc and I think it is related to the ref file. The other four are CRTM related issues. I am planning to make a copy on Hera and see how it works.
Also for the fv3tests, I had one more task that failed in yesterday's ctest, which is 106 listed below:
91% tests passed, 11 tests failed out of 127 Label Time Summary: fv3-jedi = 1049.22 secproc (126 tests) fv3jedi = 1056.39 secproc (127 tests) mpi = 1041.74 secproc (115 tests) script = 1056.39 secproc (127 tests) Total Test time (real) = 136.98 sec The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 106 - fv3jedi_test_tier1_hyb-3dvar_fsoi_forward (Failed) 108 - fv3jedi_test_tier1_hyb-3dvar_fsoi_backward (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed)
which I will test on Hera too.
Thanks @Junjun-NOAA! So the 3dfgat
and 4dfgat
tests only fail on Hera? Also, the mpasjedi_lgetkf_height_vloc
test does assimilate AMSUA data, so it could also be a CRTM issue. But I remember still having this test fail when I tried updating to CRTMv3 (the other 4 passed). I never got to the bottom of why exactly it failed.
@Junjun-NOAA Thanks for trying to do a clean test on Hera. My posted results might NOT be a clean test. Looking forward to your results.
Also could you do another test, i.e. don't use RDASApp mpasjedi/ tests, but clone mpas-bundle, build and do ctests in its own build/ directory? I suspect some tests may also failed there.
@Junjun-NOAA Thanks for trying to do a clean test on Hera. My posted results might NOT be a clean test. Looking forward to your results.
Also could you do another test, i.e. don't use RDASApp mpasjedi/ tests, but clone mpas-bundle, build and do ctests in its own build/ directory? I suspect some tests may also failed there.
I will do it and keep posting results here.
37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
HI Junjun,
How were the failed cases on radiance da was resolved?
37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
HI Junjun,
How were the failed cases on radiance da was resolved?
Hongli,
No, they are not resolved. You can refer to previous discussions, CRTMv3 things are totally excluded from this PR.
Thanks
37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
HI Junjun,
How were the failed cases on radiance da was resolved?
@hongli-wang The failed ctests does not mean there is anything wrong with the radiance DA functionalities. It is just that RDASApp and the MPASJEDI.v3.0.1 use different CRTM source codes and hence the DA outputs from RDASApp are different from those from mpasjedi.v3.0.1. It will not affect anyone who wants to do radiance DA work based on this PR.
Here is the update for mpasjedi test on Hera:
88% tests passed, 7 tests failed out of 59 Label Time Summary: executable = 116.54 secproc (13 tests) mpasjedi = 747.94 secproc (59 tests) mpi = 745.93 secproc (58 tests) script = 631.40 secproc (46 tests) Total Test time (real) = 748.06 sec The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 40 - test_mpasjedi_3dfgat (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 47 - test_mpasjedi_4dfgat (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)
workdir: /scratch1/BMC/wrfruc/jjhu/rrfsv2/RDASApp_PRs/PR194/RDASApp/build/mpas-jedi
The fv3jedi ctest on Hera:
91% tests passed, 11 tests failed out of 127 Label Time Summary: fv3-jedi = 2024.96 secproc (126 tests) fv3jedi = 2028.89 secproc (127 tests) mpi = 2020.59 secproc (115 tests) script = 2028.89 secproc (127 tests) Total Test time (real) = 257.11 sec The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 106 - fv3jedi_test_tier1_hyb-3dvar_fsoi_forward (Failed) 108 - fv3jedi_test_tier1_hyb-3dvar_fsoi_backward (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed)
workdir: /scratch1/BMC/wrfruc/jjhu/rrfsv2/RDASApp_PRs/PR194/RDASApp/build/fv3-jedi
@Junjun-NOAA Do you have the ctest results from the mpas-bundle
itself (NOT RDASApp/mpasjedi-test) on Hera?
@Junjun-NOAA Do you have the ctest results from the
mpas-bundle
itself (NOT RDASApp/mpasjedi-test) on Hera?
Not yet. Hera is very slow today.
Thanks @Junjun-NOAA for running both RDASAPP mpasjedi tests and the mpas-bundle ctests. 3dfgat passed in mpas-bundle while failed in RDASApp.
The test results are at the following two locations respectively:
/scratch1/BMC/wrfruc/jjhu/rrfsv2/RDASApp_PRs/PR194/RDASApp/build/mpas-jedi/Testing/Temporary/3dfgat.log
and
/scratch1/BMC/wrfruc/jjhu/rrfsv2/mpas-bundle-v3.0.1/build/mpas-jedi/Testing/Temporary/3dfgat.log
We compared all submodules and data directories under RDASApp and mpas-bundle-v3.0.1:
ioda/ ioda-data/ MPAS/ mpas-jedi/ mpas-jedi-data/ oops/ saber/ ufo/ ufo-data/ vader/
All are exactly the same.
By comparing the log files, we found RDASApp rejects one more Radiosonde wind obs than mpas-bundle, please see the log below:
the white color is RDASApp, the cyan color is mpas-bundle
Thanks @Junjun-NOAA and @guoqing-noaa for the additional testing. I think your analysis shows that this one small difference is pretty minor and not worth worrying about at the moment. But at least we have a record if we ever want to go back and figure out what is going on.
PASSED on hera
started build_and_test on hera at UTC time: Wed Oct 16 17:54:54 UTC 2024 finished at UTC time: Thu Oct 17 04:30:41 UTC 2024
Test project /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194/build/rrfs-test
Start 4: rrfs_mpasjedi_2024052700_getkf_observer
Start 1: rrfs_fv3jedi_hyb_2022052619
Start 2: rrfs_fv3jedi_letkf_2022052619
Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 ............. Passed 34.65 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ............... Passed 216.56 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc .......... Passed 19790.00 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ... Passed 20032.23 sec
Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ......... Passed 35029.60 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver ..... Passed 15886.36 sec
100% tests passed, 0 tests failed out of 6
Label Time Summary:
mpi = 90989.39 sec*proc (6 tests)
rdas-bundle = 90989.39 sec*proc (6 tests)
script = 90989.39 sec*proc (6 tests)
Total Test time (real) = 35919.88 sec
workdir: /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194
ctest takes 35919.88 sec. Is it an issue related to HPC?
List of submodule changes. issue #193