Closed quantheory closed 11 months ago
Thanks for making us aware of this issue. We'll talk it over at our weekly meeting.
Yes, thank you @quantheory ! I know it takes extra effort to cross-check and communicate with both modeling groups, and I appreciate you making the effort to do so!
We had a similar issue in the same routine, but in the previous lines:
When we have a small amount of initial liquid water in soil, the evaporation become negative. We tried this fix it in this way:
if (qflx_evap_soi(p) >= 0._r8) then
! for evaporation partitioning between liquid evap and ice sublimation,
! use the ratio of liquid to (liquid+ice) in the top layer to determine split
! Since we're not limiting evap over lakes, but still can't remove more from top
! snow layer than there is there, create temp. limited evap_soi.
qflx_evap_soi_lim = min(qflx_evap_soi(p), (h2osoi_liq(c,j)+h2osoi_ice(c,j))/dtime)
qflx_ev_snow(p) = qflx_evap_soi_lim
if ((h2osoi_liq(c,j)+h2osoi_ice(c,j)) > 0._r8) then
! case with both liquid and ice with very small values
if(h2osoi_ice(c,j)<1.e-12_r8 .and. h2osoi_liq(c,j)<1.e-12_r8) then
qflx_liqevap_from_top_layer(p) = h2osoi_liq(c,j)/dtime
qflx_solidevap_from_top_layer(p) = h2osoi_ice(c,j)/dtime
! case with only ice with very small values
else if(h2osoi_ice(c,j)<1.e-12_r8) then
qflx_solidevap_from_top_layer(p) = h2osoi_ice(c,j)/dtime
qflx_liqevap_from_top_layer(p) = max(qflx_evap_soi_lim - qflx_solidevap_from_top_layer(p), 0._r8)
! case with only liquid with very small values
else if (h2osoi_liq(c,j)<1.e-12_r8) then
qflx_liqevap_from_top_layer(p) = h2osoi_liq(c,j)/dtime
qflx_solidevap_from_top_layer(p) = max(qflx_evap_soi_lim - qflx_liqevap_from_top_layer(p), 0._r8)
! original case
else
qflx_liqevap_from_top_layer(p) = max(qflx_evap_soi_lim*(h2osoi_liq(c,j)/ &
(h2osoi_liq(c,j)+h2osoi_ice(c,j))), 0._r8)
qflx_solidevap_from_top_layer(p) = max(qflx_evap_soi_lim - qflx_liqevap_from_top_layer(p), 0._r8)
end if
else
qflx_liqevap_from_top_layer(p) = 0._r8
qflx_solidevap_from_top_layer(p) = 0._r8
end if
resid_ice = h2osoi_ice(c,j)-qflx_solidevap_from_top_layer(p)*dtime
resid_liq = h2osoi_liq(c,j)-qflx_liqevap_from_top_layer(p)*dtime
if (resid_ice < 0._r8 .and. resid_ice > -1.e-12_r8)then
if(resid_liq > 1.e-12_r8) then
qflx_solidevap_from_top_layer(p) = qflx_solidevap_from_top_layer(p) - abs(resid_ice)
qflx_liqevap_from_top_layer(p) = qflx_liqevap_from_top_layer(p) + abs(resid_ice)
else
qflx_solidevap_from_top_layer(p) = qflx_solidevap_from_top_layer(p) - abs(resid_ice)
end if
end if
if (resid_liq < 0._r8 .and. resid_liq > -1.e-12_r8)then
if(resid_ice > 1.e-12_r8) then
qflx_liqevap_from_top_layer(p) = qflx_liqevap_from_top_layer(p) - abs(resid_liq)
qflx_solidevap_from_top_layer(p) = qflx_solidevap_from_top_layer(p) + abs(resid_liq)
else
qflx_liqevap_from_top_layer(p) = qflx_liqevap_from_top_layer(p) - abs(resid_liq)
end if
end if
else
! if (t_grnd(c) < tfrz) then
! Causes rare blowup when thin snow layer should completely melt and has a high temp after thermal physics,
! but then is not eliminated in SnowHydrology because of this added frost. Also see below removal of
! completely melted single snow layer.
if (t_grnd(c) < tfrz .and. t_soisno(c,j) < tfrz) then
qflx_soliddew_to_top_layer(p) = abs(qflx_evap_soi(p))
! If top layer is only snow layer, SnowHydrology won't eliminate it if dew is added.
else if (j < 0 .or. (t_grnd(c) == tfrz .and. t_soisno(c,j) == tfrz)) then
qflx_liqdew_to_top_layer(p) = abs(qflx_evap_soi(p))
end if
end if
Where the variables resid_ice and resid_liq are defined as real after line 127:
real(r8) :: resid_ice, resid_liq
@daniele-peano Thanks for the information. I will see if the ELM developers are amenable to implementing similar logic in the E3SM version.
@olyson hasn't seen this issue in any of our simulations. Are there certain cases when you've seen this issue crop up? At the SE meeting this morning we discussed making a note in the source code noting this potential bug (to be done by @samsrabin ). But given that we have not come across this issue we're somewhat inclined to not fix this issue. @billsacks do you think this is an appropriate decision?
That decision makes sense to me. (It's also possible that some of the snow-related rework that @swensosc and I have done over the last few years made this issue less likely to occur in CTSM, though I haven't thought carefully about that, so maybe that's just wishful thinking.)
In ELM, this bug is only confirmed to have caused an issue for users twice ever. The only reproducer I have access to where it crashed the model, is a case where it affected only a single grid cell once, 37 years into a run. (However, that one case, combined with issues on perlmutter, held up some release testing for several weeks, so it was quite annoying when it did happen.) As a result, I can't really say much statistically about what kind of cases it affects. It is possible that some other difference in the models (e.g. different fluxes due to the higher boundary layer resolution and gustiness in EAM) increases the chance of a crash in E3SM, and in CESM such a crash would be even more unlikely. That said, it's also possible that this issue silently causes non-fatal inaccuracies more often, and so we simply haven't noticed very infrequent but large overestimates of snow depth.
I think the main argument for putting in a fix here is that while the chance of this bug causing a crash is quite low, the chance of the proposed fix itself causing problems is also quite low. But I'm not that invested in any particular outcome here; I mostly just wanted to send a heads up in case this was an issue that could affect CESM.
That all makes a lot of sense, @quantheory . Regardless of the decision here, it's very helpful to have this issue to look back at in case it crops up as a possible issue.
I'm posting this bug report because of an issue in ELM that likely also affects CTSM, concerning these lines of code:
https://github.com/ESCOMP/CTSM/blob/abc3c27e0710cd5c0fc3548c3ce174889fc6dd70/src/biogeophys/LakeHydrologyMod.F90#L333-L358
In the ELM version of this code, there are two problems that led to infrequent crashes of the model. First, due to an incorrect ordering of the limiters, it is possible for very small negative values of
snow_depth
to be produced. I believe that this does not affect CTSM due to this commit by @billsacks: https://github.com/ESCOMP/CTSM/commit/3b01f70c6775b2f22b4cae6153b79c8bf0bed9d1However, the second issue is that if the
h2osno
value (renamedh2osno_no_layers
in CTSM) is very small at the beginning of this section of code, it can produce unrealistically large values ofsnow_depth
, mainly due to division by a near-zero value (which can greatly amplify the impact of small floating-point errors). This occurs very rarely in practice, but that makes it extremely annoying to debug because the error may only show up several decades into a production run. I have verified that this occurs in ELM, and it can probably also happen in the CTSM version.Xubin Zeng has suggested a fix for this, which is to replace this line:
https://github.com/ESCOMP/CTSM/blob/abc3c27e0710cd5c0fc3548c3ce174889fc6dd70/src/biogeophys/LakeHydrologyMod.F90#L354
with
The physical justification here is that
h2osno_temp/snow_depth(c)
is being treated as a snow density here, and 50 kg/m^2 is an appropriate lower limit for snow density. (In principle, a higher or lower limiting value could be used, or another method of avoiding the divide-by-near-zero. The important thing is not to use extremely small values ofh2osno_temp
when estimating snow density.)