Closed RussTreadon-NOAA closed 3 months ago
@azadeh-gh and @emilyhcliu : I understand that you are testing the proposed changes to ensure minimal impact on the analysis. If you find that the changes in this PR are insufficient or need revision we can either abandon this PR or I can add your changes to this PR.
WCOSS2 ctests
Install RussTreadon-NOAA/feature/thompson_reff
at 408917ec on Cactus. Install develop
at e82365d9. Run ctests with the following results.
Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/thompson/build
Start 1: global_4denvar
Start 2: rtma
Start 3: rrfs_3denvar_rdasens
Start 4: hafs_4denvar_glbens
Start 5: hafs_3denvar_hybens
Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens ............. Passed 849.08 sec
2/6 Test #6: global_enkf ...................... Passed 886.57 sec
3/6 Test #2: rtma ............................. Passed 993.37 sec
4/6 Test #4: hafs_4denvar_glbens .............. Passed 1351.57 sec
5/6 Test #5: hafs_3denvar_hybens .............. Passed 1352.26 sec
6/6 Test #1: global_4denvar ...................***Failed 1707.83 sec
83% tests passed, 1 tests failed out of 6
Total Test time (real) = 1707.91 sec
The following tests FAILED:
1 - global_4denvar (Failed)
The global_4denvar
failure is expected.
The results (penalty) between the two runs are nonreproducible,
thus the regression test has Failed on cost for global_4denvar_loproc_updat and global_4denvar_loproc_contrl analyses.
The change to crtm_interface.f90
in feature/thompson_reff
alters the effective radius calculation for cloud ice and rain. This change is not in the contrl (develop
). Given the change in the effective radius, the updat and contrl gsi.x
generate different analyses.
@RussTreadon-NOAA The safeguard you added are totally reasonable. It only checked qx > 0 before the calculation, but for Thompson, check nr and ni should be added.
With the safeguard added, the global_4denvar failed due to non-reproducible is expected. The overall impact of the safeguard should be small.
Thank you @emilyhcliu for the review and approval.
WCOSS2 debug ctests
Repeat the above WCOSS2 ctests on Cactus but compile feature/thompson_reff
and develop
in debug mode. Run global_4denvar
ctest with following results
Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/thompson/build
Start 1: global_4denvar
1/1 Test #1: global_4denvar ...................***Failed 23576.70 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 23576.80 sec
The following tests FAILED:
1 - global_4denvar (Failed)
Errors while running CTest
The failure is due to the contrl (develop
) debug gsi.x
aborting with traceback
Image PC Routine Line Source
gsi.x 0000000007F31F4B Unknown Unknown Unknown
libpthread-2.31.s 000014C75BA848C0 Unknown Unknown Unknown
libimf.so 000014C75BB8AAAF __libm_log_l9 Unknown Unknown
gsi.x 00000000008853DC crtm_interface_mp 2773 crtm_interface.f90
gsi.x 000000000078BEBD crtm_interface_mp 1881 crtm_interface.f90
gsi.x 0000000005612D45 rad_setup_mp_setu 919 setuprad.f90
gsi.x 000000000400CE99 gsi_radoper_mp_se 100 gsi_radOper.F90
gsi.x 0000000002673C76 setuprhsall_ 492 setuprhsall.f90
gsi.x 0000000003F6C9F2 glbsoi_ 323 glbsoi.f90
gsi.x 00000000010A56D0 gsisub_ 200 gsisub.F90
gsi.x 000000000042CBB5 gsimod_mp_gsimain 2431 gsimod.F90
gsi.x 0000000000413B3B MAIN__ 633 gsimain.f90
Line 2773 of crtm_interace.f90
is the lab_i
line mentioned in issue #777
if (qx > qmin) then
lam_i=exp(1.0_r_kind / 3.0_r_kind * log((am_i*ni(k) *gamma(mu_i + 3.0_r_kind + 1.0_r_kind))/(qx*gamma(mu_i+1.0_r_kind))))
In contrast the updat debug gsi.x
ran to completion for both the loproc and hiproc configurations
russ.treadon@clogin02:/lfs/h2/emc/ptmp/russ.treadon/thompson/tmpreg_global_4denvar> grep wall */stdout
global_4denvar_hiproc_updat/stdout:The total amount of wall time = 5336.354999
global_4denvar_loproc_updat/stdout:The total amount of wall time = 11028.922376
The feature/thompson_reff
crtm_interface.f90
ensures the cloud ice and rain number concentrations, ni
and nr
respectively, are greater than zero before entering the lam_i
and lam_r
blocks.
@RussTreadon-NOAA @azadeh-gh would like to add some comments here.
Thank, you @emilyhcliu for the heads up. @azadeh-gh please feel free to add comments here. I do not plan on merging this PR into develop
until Monday, 8/12/2024.
@RussTreadon-NOAA Thank you Russ. I found minimum threshold 1.0e-6_r_kind for ni and nr in subroutine calc_effectRad in ccpp-physics. I think it's better to change 0 to 1.0e-6_r_kind to be consistent with the model physics.
@azadeh-gh , your suggestion has been committed to feature/thompson_reff
. Done at 9a3a90d. If the modification is satisfactory, please approve this PR.
@azadeh-gh , your suggestion has been committed to
feature/thompson_reff
. Done at 9a3a90d. If the modification is satisfactory, please approve this PR.
@RussTreadon-NOAA Thank you!
Thank you @azadeh-gh for the quick action. As a final check I will rerun the global_4denvar ctest using the optimized and debug gsi.x
on Cactus to ensure the previous ctest results remain valid. I still hope to merge this PR into develop
on Monday, 8/12/2024.
WCOSS2 tests
Build RussTreadon-NOAA:feature/thompson_reff
at 9a3a90d and develop
at e82365d on Cactus.
The optimized build yields following ctest results
Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/thompson/build
Start 1: global_4denvar
Start 2: rtma
Start 3: rrfs_3denvar_rdasens
Start 4: hafs_4denvar_glbens
Start 5: hafs_3denvar_hybens
Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens ............. Passed 728.11 sec
2/6 Test #6: global_enkf ...................... Passed 850.39 sec
3/6 Test #2: rtma ............................. Passed 968.95 sec
4/6 Test #5: hafs_3denvar_hybens .............. Passed 1152.72 sec
5/6 Test #4: hafs_4denvar_glbens .............. Passed 1213.02 sec
6/6 Test #1: global_4denvar ...................***Failed 1683.10 sec
83% tests passed, 1 tests failed out of 6
Total Test time (real) = 1683.12 sec
The following tests FAILED:
1 - global_4denvar (Failed)
Errors while running CTest
The global_4denvar failure is due to non-reproducible results.
The results (penalty) between the two runs are nonreproducible,
thus the regression test has Failed on cost for global_4denvar_loproc_updat and global_4denvar_loproc_contrl analyses.
Different analysis results are expected. This PR adds safeguards to the effective radius calculation in crtm_interface.f90
which screen out points with cloud ice and rain number concentrations less than the ccpp-physics minimum of 1.0e-6
. This change is not in develop
.
Rebuild gsi.x
in debug mode and run global_4denvar ctest. The feature/thompson_reff
debug gsi.x
ran to completion in the loproc and hiproc configurations.
russ.treadon@clogin07:/lfs/h2/emc/ptmp/russ.treadon/thompson_debug/tmpreg_global_4denvar> grep wall */stdout
global_4denvar_hiproc_updat/stdout:The total amount of wall time = 5414.495874
global_4denvar_loproc_updat/stdout:The total amount of wall time = 10779.418185
The develop
debug gsi.x
aborted on line 2773 of crtm_interface.f90
.
Image PC Routine Line Source
gsi.x 0000000007F31F4B Unknown Unknown Unknown
libpthread-2.31.s 000014DE64D8B8C0 Unknown Unknown Unknown
libimf.so 000014DE64E91AAF __libm_log_l9 Unknown Unknown
gsi.x 00000000008853DC crtm_interface_mp 2773 crtm_interface.f90
gsi.x 000000000078BEBD crtm_interface_mp 1881 crtm_interface.f90
gsi.x 0000000005612D45 rad_setup_mp_setu 919 setuprad.f90
gsi.x 000000000400CE99 gsi_radoper_mp_se 100 gsi_radOper.F90
gsi.x 0000000002673C76 setuprhsall_ 492 setuprhsall.f90
gsi.x 0000000003F6C9F2 glbsoi_ 323 glbsoi.f90
gsi.x 00000000010A56D0 gsisub_ 200 gsisub.F90
gsi.x 000000000042CBB5 gsimod_mp_gsimain 2431 gsimod.F90
gsi.x 0000000000413B3B MAIN__ 633 gsimain.f90
gsi.x 0000000000413992 Unknown Unknown Unknown
libc-2.31.so 000014DE64A6324D __libc_start_main Unknown Unknown
gsi.x 00000000004138AA Unknown Unknown Unknown
nid001356.cactus.wcoss2.ncep.noaa.gov: rank 46 died from signal 6 and dumped core
The cloud ice number concentration can be 0.0. This results in log(0)
, an invalid operation in the develop
debug gsi.x
. This PR resolves this problem via the additional safeguards added to crtm_interface.f90
.
Description This PR adds safeguards to subroutine
thompson_reff
to ensure the ice and rain number concentrations,ni
andnr
, respectively are greater than zero. With this additional check the global_4denvar ctest runs to completion using the debuggsi.x
.An additional change is to remove an extraneous debug print identified by @wx20jjung.
Resolves #777
Type of change
How Has This Been Tested? Build debug
gsi.x
and run global_4denvar ctest. Test runs to completion.Checklist