NOAA-EMC / RDASApp

Regional DAS
GNU Lesser General Public License v2.1
1 stars 13 forks source link

update RDASApp submodules to match mpasjedi-v3.0.1 #194

Closed Junjun-NOAA closed 1 month ago

Junjun-NOAA commented 1 month ago

List of submodule changes. issue #193

ioda             c7b8760f -> d49ed17e
ufo              92ccfb2a -> 94d50d64
oops             35820130 -> d77217323
vader            e3457cba -> 6d56a1eb5
mpas             3ecd59e2 -> 41e9a3fb8   #  repo URL also changed
mpas-jedi        a1c60997 -> b9d596d7c
#fv3-jedi         d3c800b8 -> c99519638
#fv3-jedi-lm      a6e97d76 -> 30ef7a390
Junjun-NOAA commented 1 month ago

mpas-jedi test:

90% tests passed, 6 tests failed out of 59

Label Time Summary: executable = 92.81 secproc (13 tests) mpasjedi = 712.40 secproc (59 tests) mpi = 705.89 secproc (58 tests) script = 619.59 secproc (46 tests)

Total Test time (real) = 112.89 sec

The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

Junjun-NOAA commented 1 month ago

fv3-jedi test:

92% tests passed, 10 tests failed out of 127

Label Time Summary: fv3-jedi = 1025.78 secproc (126 tests) fv3jedi = 1032.39 secproc (127 tests) mpi = 1020.63 secproc (115 tests) script = 1032.39 secproc (127 tests)

Total Test time (real) = 155.42 sec

The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed) 125 - fv3jedi_test_tier1_eda_3dvar_control_pert (Failed)

guoqing-noaa commented 1 month ago

NOTE: The following fix files were added per the need of this PR and corresponding links under fix/ were updated.

fv3-jedi-data_2085be5_20241008
ioda-data_20241011
mpas-jedi-data/testinput_tier_1/obs
VEGPARM.TBL.20241011
NoahmpTable.TBL

Fix file changes were sync'ed on Jet/Hera/Orion/Hercules and archived to HPSS

guoqing-noaa commented 1 month ago

mpas-jedi test:

90% tests passed, 6 tests failed out of 59

Label Time Summary: executable = 92.81 sec_proc (13 tests) mpasjedi = 712.40 sec_proc (59 tests) mpi = 705.89 sec_proc (58 tests) script = 619.59 sec_proc (46 tests)

Total Test time (real) = 112.89 sec

The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

@Junjun-NOAA For you failed mpas-jedi tests, I guess it may be related to the CRTM source code version. Could you try a test by update sorc/crtm to mpasjedi.v3.0.1 in a different local copy and rerun mpas-jedi tests? If it passed, post your results and run directory here. Thanks!

guoqing-noaa commented 1 month ago

fv3-jedi test:

92% tests passed, 10 tests failed out of 127

Label Time Summary: fv3-jedi = 1025.78 sec_proc (126 tests) fv3jedi = 1032.39 sec_proc (127 tests) mpi = 1020.63 sec_proc (115 tests) script = 1032.39 sec_proc (127 tests)

Total Test time (real) = 155.42 sec

The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed) 125 - fv3jedi_test_tier1_eda_3dvar_control_pert (Failed)

@Junjun-NOAA For the failed fv3-jedi cases, you can try the following steps: 1) update all the remaining submodules to match jedi-bundle 2) rerun the ctests using my latest commit which updated fv3-jedi-data If 2) does not solve all fails, then 3) update the sorc/crtm module and see whether it helps.

Thanks!

guoqing-noaa commented 1 month ago

rrfs-test all passed on Hera

Test project /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/rrfs-test
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............   Passed   43.19 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............   Passed  103.69 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  122.99 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  344.80 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  416.29 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  1289.32 sec

100% tests passed, 0 tests failed out of 6

Label Time Summary:
mpi            = 2320.29 sec*proc (6 tests)
rdas-bundle    = 2320.29 sec*proc (6 tests)
script         = 2320.29 sec*proc (6 tests)
Junjun-NOAA commented 1 month ago

mpas-jedi test:

90% tests passed, 6 tests failed out of 59

Label Time Summary: executable = 92.81 sec_proc (13 tests) mpasjedi = 712.40 sec_proc (59 tests) mpi = 705.89 sec_proc (58 tests) script = 619.59 sec_proc (46 tests)

Total Test time (real) = 112.89 sec

The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

The updates include: ioda c7b8760fc -> d49ed17e ufo 92ccfb2a -> 94d50d64d oops 35820130 -> d77217323 vader e3457cba4 -> 6d56a1eb5b mpas 3ecd59e2c -> 41e9a3fb8 mpas-jedi a1c609973 -> b9d596d7c9

guoqing-noaa commented 1 month ago

@Junjun-NOAA To clarify, the fv3-jedi submodule was updated to match the latest jedi-bundle instead of mpasjedi.v3.0.1.

If will be good if you can make a list of all updated submodules in your issue #193 , describing the old commit and new commit (if the repo name is changed, mark that information as well). Thanks!

guoqing-noaa commented 1 month ago

To clarify: crtm/fix_REL-3.1.1.2 was added/sync'd under RDAS_DATA for facilitate offline mpasjedi ctests (i.e. test mpasjedi using the CRTMv3 source code but we don't update the crtm submodule in current RDASApp until everyone (especially @xyzemc and @HaidaoLin-NOAA) is ready this upgrade.

guoqing-noaa commented 1 month ago

For the mpasjedi tests, be sure to use ctest instead ctest -j8 as the mpasjedi testing dependencies may not be set up.

For the tests conducted by me (on Hera) and @Junjun-NOAA (on Jet), all failed tests are related to the outputs don't match the reference data. This was caused by different CRTM versions used by RDASApp and MPASJEDI.

rrfsbot commented 1 month ago

FAILED on hercules

started build_and_test on hercules at UTC time: Sun Oct 13 20:05:56 UTC 2024 finished at UTC time: Sun Oct 13 21:11:35 UTC 2024

Test project /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............***Failed  140.02 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............***Failed  158.67 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  213.51 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  744.54 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  1378.97 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  1373.57 sec

67% tests passed, 2 tests failed out of 6

Label Time Summary:
mpi            = 4009.27 sec*proc (6 tests)
rdas-bundle    = 4009.27 sec*proc (6 tests)
script         = 4009.27 sec*proc (6 tests)

Total Test time (real) = 2118.11 sec

The following tests FAILED:
      1 - rrfs_fv3jedi_hyb_2022052619 (Failed)
      2 - rrfs_fv3jedi_letkf_2022052619 (Failed)
Errors while running CTest
Output from these tests are in: /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

workdir: /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194

guoqing-noaa commented 1 month ago

fv3jedi tests on Hera:

No errors, all failed tests are due to what(): Test reference mismatch which may be related to the CRTM mismatch in RDASApp and in latest jedi-bundle.

94% tests passed, 7 tests failed out of 127

Label Time Summary:
fv3-jedi    = 876.53 sec*proc (126 tests)
fv3jedi     = 880.59 sec*proc (127 tests)
mpi         = 869.16 sec*proc (115 tests)
script      = 880.59 sec*proc (127 tests)

Total Test time (real) = 881.01 sec

The following tests FAILED:
         70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed)
         88 - fv3jedi_test_tier1_hyb-3dvar (Failed)
         91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed)
         96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed)
         98 - fv3jedi_test_tier1_4denvar (Failed)
         99 - fv3jedi_test_tier1_4denvar_seq (Failed)
        111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/fv3-jedi/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
guoqing-noaa commented 1 month ago

mpasjedi tests on Hera:

No errors, all failed tests are due to what(): Test reference mismatch which may be related to the CRTM mismatch in RDASApp and in latest jedi-bundle.

88% tests passed, 7 tests failed out of 59

Label Time Summary:
executable    = 138.51 sec*proc (13 tests)
mpasjedi      = 690.62 sec*proc (59 tests)
mpi           = 688.64 sec*proc (58 tests)
script        = 552.11 sec*proc (46 tests)

Total Test time (real) = 690.85 sec

The following tests FAILED:
         37 - test_mpasjedi_3denvar_amsua_allsky (Failed)
         38 - test_mpasjedi_3denvar_amsua_bc (Failed)
         40 - test_mpasjedi_3dfgat (Failed)
         43 - test_mpasjedi_4denvar_VarBC (Failed)
         44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed)
         47 - test_mpasjedi_4dfgat (Failed)
         54 - test_mpasjedi_lgetkf_height_vloc (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/mpas-jedi/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
rrfsbot commented 1 month ago

FAILED on jet

started build_and_test on jet at UTC time: Sun Oct 13 22:11:13 UTC 2024 finished at UTC time: Sun Oct 13 23:15:32 UTC 2024

Test project /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............***Failed   71.72 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............***Failed   71.74 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  232.28 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  481.82 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  649.56 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  1759.14 sec

67% tests passed, 2 tests failed out of 6

Label Time Summary:
mpi            = 3266.27 sec*proc (6 tests)
rdas-bundle    = 3266.27 sec*proc (6 tests)
script         = 3266.27 sec*proc (6 tests)

Total Test time (real) = 2241.01 sec

The following tests FAILED:
      1 - rrfs_fv3jedi_hyb_2022052619 (Failed)
      2 - rrfs_fv3jedi_letkf_2022052619 (Failed)
Errors while running CTest
Output from these tests are in: /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

workdir: /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194

guoqing-noaa commented 1 month ago

I checked the Failed CI rrfs-tests on Jet and Hercules, they are all due to the following errors:

FATAL from PE 0: NetCDF: Variable not found: get_variable_num_dimension: file:Data/bkg/fv3_dynvars.nc variable: DELP

When I ncdump Data/bkg/fv3_dynvars.nc , I can only find delp instead of DELP. @TingLei-NOAA @SamuelDegelia-NOAA Do you have any ideas on this? Is there a recent update on fv3jedi changing delp to DELP?

The above two CI tests used the new fv3-jedi and fv3-jedi-lm sub modules. At the same time, my rrfs-tests all passed in Hera as I did not update my fv3-jedi and fv3-jedi-lm submodules. I just reverted Junjun's updates on the above two submodules. I believe the latest code should pass CI tests on Hercules and Jet.
Once this PR is reviewed and approved, we can do a final round of CI tests on Hera (and on Hercules and Jet again if needed).

It is good that the latest commit of this PR does not break rrfs-fv3-tests. We will need more fv3jedi experts on helping with the fv3jedi submodule updates, potentially in another PR.

TingLei-NOAA commented 1 month ago

I checked the Failed CI rrfs-tests on Jet and Hercules, they are all due to the following errors:

FATAL from PE 0: NetCDF: Variable not found: get_variable_num_dimension: file:Data/bkg/fv3_dynvars.nc variable: DELP

When I ncdump Data/bkg/fv3_dynvars.nc , I can only find delp instead of DELP. @TingLei-NOAA @SamuelDegelia-NOAA Do you have any ideas on this? Is there a recent update on fv3jedi changing delp to DELP?

The above two CI tests used the new fv3-jedi and fv3-jedi-lm sub modules. At the same time, my rrfs-tests all passed in Hera as I did not update my fv3-jedi and fv3-jedi-lm submodules. I just reverted Junjun's updates on the above two submodules. I believe the latest code should pass CI tests on Hercules and Jet. Once this PR is reviewed and approved, we can do a final round of CI tests on Hera (and on Hercules and Jet again if needed).

It is good that this latest commit of this PR does not break rrfs-fv3-tests. We will need more fv3jedi experts on helping with the fv3jedi submodule updates, and potentially in another PR.

@guoqing-noaa it is some change of the corresponding names of delp in the background to DELP in the new fv3-jedi . See https://github.com/JCSDA-internal/fv3-jedi/pull/1251#issuecomment-2387219801, which should be controlled by the "field metadata override" . Since your goal in this PR is to match mpasjedi-v3.0.1. I would suggest let fv3 parts unchanged for being now.

guoqing-noaa commented 1 month ago

@guoqing-noaa it is some change of the corresponding names of delp in the background to DELP in the new fv3-jedi . See JCSDA-internal/fv3-jedi#1251 (comment), which should be controlled by the "field metadata override" . Since your goal in this PR is to match mpasjedi-v3.0.1. I would suggest let fv3 parts unchanged for being now.

@TingLei-NOAA Thanks a lot for your quick reply and great information! I agree with you that this PR will focus on updating the mpasjedi components (while not breaking rrfs-fv3 tests in current RDASApp).

rrfsbot commented 1 month ago

PASSED on jet

started build_and_test on jet at UTC time: Sun Oct 13 23:57:45 UTC 2024 finished at UTC time: Mon Oct 14 01:07:48 UTC 2024

Test project /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............   Passed   60.78 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............   Passed  124.18 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  188.37 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  612.82 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  744.72 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  1709.23 sec

100% tests passed, 0 tests failed out of 6

Label Time Summary:
mpi            = 3440.11 sec*proc (6 tests)
rdas-bundle    = 3440.11 sec*proc (6 tests)
script         = 3440.11 sec*proc (6 tests)

Total Test time (real) = 2322.08 sec

workdir: /lfs5/BMC/wrfruc/rrfsbot/PRs_RDASApp/194

rrfsbot commented 1 month ago

PASSED on hercules

started build_and_test on hercules at UTC time: Sun Oct 13 23:56:13 UTC 2024 finished at UTC time: Mon Oct 14 01:08:58 UTC 2024

Test project /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194/build/rrfs-test
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............   Passed  687.05 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............   Passed  755.45 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  785.31 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  1287.82 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  1838.96 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  1261.44 sec

100% tests passed, 0 tests failed out of 6

Label Time Summary:
mpi            = 6616.04 sec*proc (6 tests)
rdas-bundle    = 6616.04 sec*proc (6 tests)
script         = 6616.04 sec*proc (6 tests)

Total Test time (real) = 2549.27 sec

workdir: /work/noaa/wrfruc/rrfsbot/PRs_RDASApp/194

guoqing-noaa commented 1 month ago

@ShunLiu-NOAA @hu5970 @TingLei-NOAA @SamuelDegelia-NOAA @delippi

This PR is now ready for review! Junjun and I tested the lated commit on Jet and Hera respectively. All rrfs-tests passed!

For mpasjedi and fv3jedi tests, there are NO errors in running to the finish line. But a small set of mpasjedi/fv3jedi tests failed as their output is different from the reference files (the log files from Hera were posted in the above posts). This is due to the CRTM source code mismatch between RDASApp and the latest mpas-bundle/jedi-bundle. We are not ready to update the CRTM source code yet per previous communication.

Also, per @TingLei-NOAA, due to the recent variable name change (delp to DELP) in the fv3 background, it is preferred NOT to update fv3jedi submodules in RDASApp.

rrfsbot commented 1 month ago

PASSED on hera

started build_and_test on hera at UTC time: Mon Oct 14 01:14:52 UTC 2024 finished at UTC time: Mon Oct 14 02:12:44 UTC 2024

Test project /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194/build/rrfs-test
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............   Passed   40.64 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............   Passed  102.40 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  126.30 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  366.29 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  425.77 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  1225.62 sec

100% tests passed, 0 tests failed out of 6

Label Time Summary:
mpi            = 2287.02 sec*proc (6 tests)
rdas-bundle    = 2287.02 sec*proc (6 tests)
script         = 2287.02 sec*proc (6 tests)

Total Test time (real) = 1592.39 sec

workdir: /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194

Junjun-NOAA commented 1 month ago

One item to add: the MPASJEDI 2024052700 case (Ens3Dvar, letkf and getkf) run successfully. Tested on Jet.

SamuelDegelia-NOAA commented 1 month ago

Thanks @Junjun-NOAA for working to update these submodules. One thing that might be helpful is to keep a log (maybe via an issue) of all the tests that fail due to CRTM versions. That way when we update again in the future and run mpasjedi tests, we will know which ones we expect to continue failing.

xyzemc commented 1 month ago

To clarify: crtm/fix_REL-3.1.1.2 was added/sync'd under RDAS_DATA for facilitate offline mpasjedi ctests (i.e. test mpasjedi using the CRTMv3 source code but we don't update the crtm submodule in current RDASApp until everyone (especially @xyzemc and @HaidaoLin-NOAA) is ready this upgrade.

It is a little confused to update the CRTMv3 source code in RDASApp before we test if it works correctly.

guoqing-noaa commented 1 month ago

To clarify: crtm/fix_REL-3.1.1.2 was added/sync'd under RDAS_DATA for facilitate offline mpasjedi ctests (i.e. test mpasjedi using the CRTMv3 source code but we don't update the crtm submodule in current RDASApp until everyone (especially @xyzemc and @HaidaoLin-NOAA) is ready this upgrade.

It is a little confused to update the CRTMv3 source code in RDASApp before we test if it works correctly.

@xyzemc We did not update CRTMv3 source code in this PR. We only staged those coefficient files under the fix/ directory.

It does not affect anything.

guoqing-noaa commented 1 month ago

Thanks @Junjun-NOAA for working to update these submodules. One thing that might be helpful is to keep a log (maybe via an issue) of all the tests that fail due to CRTM versions. That way when we update again in the future and run mpasjedi tests, we will know which ones we expect to continue failing.

@SamuelDegelia-NOAA The failed tests should be listed in the above posts. All those failed tests are due to the CRTM versions.

SamuelDegelia-NOAA commented 1 month ago

Thanks @Junjun-NOAA for working to update these submodules. One thing that might be helpful is to keep a log (maybe via an issue) of all the tests that fail due to CRTM versions. That way when we update again in the future and run mpasjedi tests, we will know which ones we expect to continue failing.

@SamuelDegelia-NOAA The failed tests should be listed in the above posts. All those failed tests are due to the CRTM versions.

That works, I can just bookmark that page to remember which tests we expect to fail.

Also, do you happen to know why the test_mpasjedi_3dfgat and test_mpasjedi_4dfgat tests are failing now? Those were not failing when we tested in #158.

guoqing-noaa commented 1 month ago

@xyzemc @SamuelDegelia-NOAA I just made a new commit and hence all CRTMv3 things are totally excluded from this PR.

guoqing-noaa commented 1 month ago

Also, do you happen to know why the test_mpasjedi_3dfgat and test_mpasjedi_4dfgat tests are failing now? Those were not failing when we tested in #158.

They are also due to output does not match reference. I don't know exact reason. It might be the CRTM versions or the obs data change.

guoqing-noaa commented 1 month ago

@SamuelDegelia-NOAA Here is the mpas-jedi test log I ran on Hera: /scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_junjun-noaa_mpasjedi3.0.1/build/mpas-jedi/Testing/Temporary/LastTest.log

Feel free to check it or you can build your own RDASApp and test it. Thanks!

SamuelDegelia-NOAA commented 1 month ago

The 3dfgat test fails due to a mismatch of the following line:

Test Line: 'CostJo   : Nonlinear Jo(Radiosonde) = 9.2179903264942527e+02, nobs = 968, Jo/n = 9.5227172794362114e-01, err = 1.9868525944362956e+00'
Ref Line : 'CostJo   : Nonlinear Jo(Radiosonde) = 9.2629445732366753e+02, nobs = 969, Jo/n = 9.5592823253216463e-01, err = 1.9863363417264606e+00'

That makes it sound like not necessarily a CRTM issue. I checked and the sondes_obs_2018041500_m.nc4 doesn't seemed to have changed in this update. Is CRTM the only version difference now between RDASApp and the mpasjedi bundle?

guoqing-noaa commented 1 month ago

The 3dfgat test fails due to a mismatch of the following line:

Test Line: 'CostJo   : Nonlinear Jo(Radiosonde) = 9.2179903264942527e+02, nobs = 968, Jo/n = 9.5227172794362114e-01, err = 1.9868525944362956e+00'
Ref Line : 'CostJo   : Nonlinear Jo(Radiosonde) = 9.2629445732366753e+02, nobs = 969, Jo/n = 9.5592823253216463e-01, err = 1.9863363417264606e+00'

That makes it sound like not necessarily a CRTM issue. I checked and the sondes_obs_2018041500_m.nc4 doesn't seemed to have changed in this update. Is CRTM the only version difference now between RDASApp and the mpasjedi bundle?

@SamuelDegelia-NOAA Thanks for further checking this. Does this test assimilate radiance data as well?

@Junjun-NOAA Could you clone/build mpas-bundle v3.0.1 on Hera and check whether it can pass all of its own ctests? Thanks!

SamuelDegelia-NOAA commented 1 month ago

@SamuelDegelia-NOAA Thanks for further checking this. Does this test assimilate radiance data as well?

The 3dfgat test also assimilates GNSSRO refractivity obs but that uses a different obs operator (not CRTM).

Junjun-NOAA commented 1 month ago

@SamuelDegelia-NOAA @guoqing-noaa Thanks for the discussion about these failed mpastests. Actually I made a separate of RDASApp on Jet and made ctest yesterday. The test_mpasjedi_4dfgat_append_obs passed but the other five still failed, listed below: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

I checked test_mpasjedi_lgetkf_height_vloc and I think it is related to the ref file. The other four are CRTM related issues. I am planning to make a copy on Hera and see how it works.

Junjun-NOAA commented 1 month ago

Also for the fv3tests, I had one more task that failed in yesterday's ctest, which is 106 listed below:

91% tests passed, 11 tests failed out of 127 Label Time Summary: fv3-jedi = 1049.22 secproc (126 tests) fv3jedi = 1056.39 secproc (127 tests) mpi = 1041.74 secproc (115 tests) script = 1056.39 secproc (127 tests) Total Test time (real) = 136.98 sec The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 106 - fv3jedi_test_tier1_hyb-3dvar_fsoi_forward (Failed) 108 - fv3jedi_test_tier1_hyb-3dvar_fsoi_backward (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed)

which I will test on Hera too.

SamuelDegelia-NOAA commented 1 month ago

Thanks @Junjun-NOAA! So the 3dfgat and 4dfgat tests only fail on Hera? Also, the mpasjedi_lgetkf_height_vloc test does assimilate AMSUA data, so it could also be a CRTM issue. But I remember still having this test fail when I tried updating to CRTMv3 (the other 4 passed). I never got to the bottom of why exactly it failed.

guoqing-noaa commented 1 month ago

@Junjun-NOAA Thanks for trying to do a clean test on Hera. My posted results might NOT be a clean test. Looking forward to your results.

Also could you do another test, i.e. don't use RDASApp mpasjedi/ tests, but clone mpas-bundle, build and do ctests in its own build/ directory? I suspect some tests may also failed there.

Junjun-NOAA commented 1 month ago

@Junjun-NOAA Thanks for trying to do a clean test on Hera. My posted results might NOT be a clean test. Looking forward to your results.

Also could you do another test, i.e. don't use RDASApp mpasjedi/ tests, but clone mpas-bundle, build and do ctests in its own build/ directory? I suspect some tests may also failed there.

I will do it and keep posting results here.

hongli-wang commented 1 month ago

37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

HI Junjun,

How were the failed cases on radiance da was resolved?

Junjun-NOAA commented 1 month ago

37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

HI Junjun,

How were the failed cases on radiance da was resolved?

Hongli,

No, they are not resolved. You can refer to previous discussions, CRTMv3 things are totally excluded from this PR.

Thanks

guoqing-noaa commented 1 month ago

37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 48 - test_mpasjedi_4dfgat_append_obs (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

HI Junjun,

How were the failed cases on radiance da was resolved?

@hongli-wang The failed ctests does not mean there is anything wrong with the radiance DA functionalities. It is just that RDASApp and the MPASJEDI.v3.0.1 use different CRTM source codes and hence the DA outputs from RDASApp are different from those from mpasjedi.v3.0.1. It will not affect anyone who wants to do radiance DA work based on this PR.

Junjun-NOAA commented 1 month ago

Here is the update for mpasjedi test on Hera:

88% tests passed, 7 tests failed out of 59 Label Time Summary: executable = 116.54 secproc (13 tests) mpasjedi = 747.94 secproc (59 tests) mpi = 745.93 secproc (58 tests) script = 631.40 secproc (46 tests) Total Test time (real) = 748.06 sec The following tests FAILED: 37 - test_mpasjedi_3denvar_amsua_allsky (Failed) 38 - test_mpasjedi_3denvar_amsua_bc (Failed) 40 - test_mpasjedi_3dfgat (Failed) 43 - test_mpasjedi_4denvar_VarBC (Failed) 44 - test_mpasjedi_4denvar_VarBC_nonpar (Failed) 47 - test_mpasjedi_4dfgat (Failed) 54 - test_mpasjedi_lgetkf_height_vloc (Failed)

workdir: /scratch1/BMC/wrfruc/jjhu/rrfsv2/RDASApp_PRs/PR194/RDASApp/build/mpas-jedi

Junjun-NOAA commented 1 month ago

The fv3jedi ctest on Hera:

91% tests passed, 11 tests failed out of 127 Label Time Summary: fv3-jedi = 2024.96 secproc (126 tests) fv3jedi = 2028.89 secproc (127 tests) mpi = 2020.59 secproc (115 tests) script = 2028.89 secproc (127 tests) Total Test time (real) = 257.11 sec The following tests FAILED: 70 - fv3jedi_test_tier1_hofx_nomodel_abi_radii (Failed) 88 - fv3jedi_test_tier1_hyb-3dvar (Failed) 91 - fv3jedi_test_tier1_3dvar_lam_cmaq (Failed) 96 - fv3jedi_test_tier1_hyb-fgat_fv3lm (Failed) 98 - fv3jedi_test_tier1_4denvar (Failed) 99 - fv3jedi_test_tier1_4denvar_seq (Failed) 106 - fv3jedi_test_tier1_hyb-3dvar_fsoi_forward (Failed) 108 - fv3jedi_test_tier1_hyb-3dvar_fsoi_backward (Failed) 109 - fv3jedi_test_tier1_diffstates_gfs (Failed) 111 - fv3jedi_test_tier1_diffstates_lam_cmaq (Failed) 112 - fv3jedi_test_tier1_addincrement_gfs (Failed)

workdir: /scratch1/BMC/wrfruc/jjhu/rrfsv2/RDASApp_PRs/PR194/RDASApp/build/fv3-jedi

guoqing-noaa commented 1 month ago

@Junjun-NOAA Do you have the ctest results from the mpas-bundle itself (NOT RDASApp/mpasjedi-test) on Hera?

Junjun-NOAA commented 1 month ago

@Junjun-NOAA Do you have the ctest results from the mpas-bundle itself (NOT RDASApp/mpasjedi-test) on Hera?

Not yet. Hera is very slow today.

guoqing-noaa commented 1 month ago

Thanks @Junjun-NOAA for running both RDASAPP mpasjedi tests and the mpas-bundle ctests. 3dfgat passed in mpas-bundle while failed in RDASApp.

The test results are at the following two locations respectively: /scratch1/BMC/wrfruc/jjhu/rrfsv2/RDASApp_PRs/PR194/RDASApp/build/mpas-jedi/Testing/Temporary/3dfgat.log and /scratch1/BMC/wrfruc/jjhu/rrfsv2/mpas-bundle-v3.0.1/build/mpas-jedi/Testing/Temporary/3dfgat.log

We compared all submodules and data directories under RDASApp and mpas-bundle-v3.0.1: ioda/ ioda-data/ MPAS/ mpas-jedi/ mpas-jedi-data/ oops/ saber/ ufo/ ufo-data/ vader/ All are exactly the same.

Junjun-NOAA commented 1 month ago

By comparing the log files, we found RDASApp rejects one more Radiosonde wind obs than mpas-bundle, please see the log below:

Screenshot 2024-10-15 at 10 49 20 PM

the white color is RDASApp, the cyan color is mpas-bundle

SamuelDegelia-NOAA commented 1 month ago

Thanks @Junjun-NOAA and @guoqing-noaa for the additional testing. I think your analysis shows that this one small difference is pretty minor and not worth worrying about at the moment. But at least we have a record if we ever want to go back and figure out what is going on.

rrfsbot commented 1 month ago

PASSED on hera

started build_and_test on hera at UTC time: Wed Oct 16 17:54:54 UTC 2024 finished at UTC time: Thu Oct 17 04:30:41 UTC 2024

Test project /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194/build/rrfs-test
    Start 4: rrfs_mpasjedi_2024052700_getkf_observer
    Start 1: rrfs_fv3jedi_hyb_2022052619
    Start 2: rrfs_fv3jedi_letkf_2022052619
    Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar
    Start 6: rrfs_mpasjedi_2024052700_bumploc
1/6 Test #2: rrfs_fv3jedi_letkf_2022052619 .............   Passed   34.65 sec
2/6 Test #1: rrfs_fv3jedi_hyb_2022052619 ...............   Passed  216.56 sec
3/6 Test #6: rrfs_mpasjedi_2024052700_bumploc ..........   Passed  19790.00 sec
4/6 Test #4: rrfs_mpasjedi_2024052700_getkf_observer ...   Passed  20032.23 sec
    Start 5: rrfs_mpasjedi_2024052700_getkf_solver
5/6 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar .........   Passed  35029.60 sec
6/6 Test #5: rrfs_mpasjedi_2024052700_getkf_solver .....   Passed  15886.36 sec

100% tests passed, 0 tests failed out of 6

Label Time Summary:
mpi            = 90989.39 sec*proc (6 tests)
rdas-bundle    = 90989.39 sec*proc (6 tests)
script         = 90989.39 sec*proc (6 tests)

Total Test time (real) = 35919.88 sec

workdir: /scratch1/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/194

ShunLiu-NOAA commented 1 month ago

ctest takes 35919.88 sec. Is it an issue related to HPC?