E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
354 stars 368 forks source link

Floating invalid in DryDepVelocity.F90 and rapid ozone loss after coupling with UCI_chem #5535

Open wlin7 opened 1 year ago

wlin7 commented 1 year ago

In a testing branch with #4707 merged into master, debug mode exposes a floating invalid at line 469 of elm/src/biogeochem/DryDepVelocity.F90 rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )

And when it occur, it is due to rssun(pi) = 0.0.

The following logic is added to allow the debug mode to proceed.

if (rssun(pi)==0._r8) then
    rs = 0.0_r8
else
    rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )
endif

Moreover, it is found that without the above logic to contain floating invalid, the simulation would experience rapid ozone loss: global tropospheric column ozone (TCO) losing 3/4 of the normal value and stratospheric column ozone (SCO) losing nearly 2/3.

Is it a bug to for rssun(pi)to have zero value? If occurrence of zero value is acceptable, if the above if-else-endif block to control the assignment for rs acceptable?

Where is the root cause? Related elm module need corresponding treatment when more gas species added?

Urgent helps are needed in order for merging primary v3atm features to master.

wlin7 commented 1 year ago

Hi @bishtgautam , can you please take a look at this issue? @tangq @keziming did additional experiments and found the dry deposition fluxes calculated in land are unrealistically high. Shutting off that fluxes would prevent the rapid ozone loss. The occurrence of floating invalid in DryDepVelocity.F90 is just a hint, the root cause may be elsewhere.

FYI: The issue does not exist in v3atm repo . The land component in that branch has no update since Jul 22, 2021 (e3472d21b20e57eaaeb7078c82c3578e25dc6e88).

The problem can be investigated using branch wlin/atm/PM_4707_debug, with test

SMS_D_Ln5.ne30pg2_EC30to60E2r2.F20TR_chemUCI-Linozv3.chrysalis_intel.eam-v3atm_rtmoff

tangq commented 1 year ago

@bishtgautam and @wlin7, The test @keziming and I did limits the surface dry deposition in the bottom 10 layers in the atmosphere. The code changes are only a few lines and can be found at https://github.com/E3SM-Project/E3SM/compare/wlin/atm/PM_4707_debug...keziming/atm/PM_4707_10layer_drydep.

This 10-layer limit on dry deposition prevents the ozone rapid loss within the whole column and suggests the dry deposition flux is too large, which seems calculated by land with drydep_method == DD_XLND and specified by the user namelist setting drydep_method = 'xactive_lnd'.

bishtgautam commented 1 year ago

@tangq's code change https://github.com/E3SM-Project/E3SM/compare/wlin/atm/PM_4707_debug...keziming/atm/PM_4707_10layer_drydep#diff-3f1e2d55b720fe05c031b385c5c072579158f6b7c91ce20b59e5ceda7f1cd89eR476

                  ! rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )
                  !if (rssun(pi)==0._r8) then
                  !   rs = 0.0_r8
                  !else
                  !   rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )
                  !endif
                  ! set rs calculation back
                  rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )  

Is the code working correctly with @tangq's changes or are you still getting unacceptable values?

wlin7 commented 1 year ago

@bishtgautam , my understanding is that the unacceptable values are still passed to atm, even with @tangq 's changes in the atm module mo_gas_phase_chemdr.F90. Just not using those fluxes. And when commenting out the codes above, floating invalid (division by 0 value of rssun) would be reported when running in debug mode. Curious why division by 0 was not reported when running in non-debug mode.

bishtgautam commented 1 year ago

I don't have the research background to understand the physics in DryDepVelocity.F90, but here is one fix that I can suggest:

if (rssun(pi) > 0._r8 .and. rssha(pi) > 0._r8) then
   rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )  
else if (rssun(pi) > 0._r8 .and. rssha(pi) == 0._r8) then
   rs = 1.0_r8 / ( fsun(pi)/rssun(pi) )  
else if (rssun(pi) == 0._r8 .and. rssha(pi) > 0._r8) then
   rs = 1.0_r8 / ( (1.0_r8 - fsun(pi))/rssha(pi) )  
else
   rs = 0._r8
endif

[Updated code snippet]

wlin7 commented 1 year ago

Thanks, @bishtgautam . In the code snippets above, no doubt you must have meant the two else-if conditions to be swapped.

It would be ideal if we can find the root cause, including if physically possible and understand how rssun and rssha could become 0s. Nevertheless, a longer simulation can be tested if the results are reasonable after this fix (assuming the model can run. In my previous test, with the rough fix as described in the issue, the run would end shortly after 1 month without meaningful error message. Re-run remained so.)

tangq commented 1 year ago

@bishtgautam , my test is just for confirming the cause of the issue, but not the ideal fix. Do you know who might be familiar with that part of code?

bishtgautam commented 1 year ago

Thanks for catching my mistake and I have updated the code snippet.

Regarding the root cause of rssun and rssha becoming zero, I suspect the issue the pi-index that is being used in the do-loop DryDepVelocity.F90#L256-L259:

      pft_loop: do pi = bounds%begp,bounds%endp
         l = veg_pp%landunit(pi)

         active: if (veg_pp%active(pi)) then
         ...
         ...
         ...

In other parts of the ELM code, rs (which could be rssun or rssha) is only filled with valid values for a filterp (see PhotosynthesisMod.F90#L932-L933) and provided laican > 0._r8) (see PhotosynthesisMod.F90#L957). I suspect the pi in the DryDepVelocity.F90 is looping over indices for which rssun and rssha have zero values.

wlin7 commented 1 year ago

Sorry no good news to report back. With the probably better fix suggested by @bishtgautam , than the brute one by simply setting rs=0 as described at the beginning of this issue, the run would abort even sooner (on day 15 rather than day 33).

keziming commented 1 year ago

Qi and I run simulation to limited dry deposition's impact only to first 10 layers from the surface. The TCO and SCO bounced back compared to PM_4707. But the simulation only can last less than three months due to unreasonably large AOD. In summary, the unexpected large dry deposition caused the rapid ozone depletion, as well as other gases and aerosols. The simulation page is at https://acme-climate.atlassian.net/wiki/spaces/ATMOS/pages/3721887751/simulations+for+diagnosing+rapid+ozone+loss+issue

wlin7 commented 1 year ago

The calculation of rs was different in master and in v3atm.

In the master, it is computed as the following, which is susceptible to division by 0.

   ` rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )

` In the v3atm version (older land), it is

   `rs=(fsun(pi)*rssun(pi))+(rssha(pi)*(1._r8-fsun(pi)))`

This change was brought in by #4678, intended as a bug fix for O3 dry deposition.

@balwinder, can you please also revisit this fix? The rapid ozone loss is traced to this module, which does not happen in the v3atm version. The only difference appears to be in the calculation of rs above.

Thanks to silent heroes @huiwanpnnl and @oksanaguba . Just learned today how to use git's 'blame' function today to quickly locate the PR and hopefully getting the best helper, following a tip by Hui on issue #5295.

wlin7 commented 1 year ago

@bishtgautam just pointed me to the updated DryDepVelocity in CSTM codes https://github.com/ESCOMP/CTSM/commit/9e9900bf5c1725d9b1fb87a4d9c74260212034fb

It contains the same fix for rs calculation as in #4678, and some more. Does that mean we need to incorporate the additional changes as well? Gautam and Balwinder, please make the call how to proceed.

wlin7 commented 1 year ago

Update: Simply reverting rscalculation back to rs=(fsun(pi)*rssun(pi))+(rssha(pi)*(1._r8-fsun(pi))) wouldn't work either. The model would also experience unreasonably high aerosol optical depth and ended prematurely (on day 18).

tangq commented 1 year ago

Tagging @hwangacme and @mingxuanwupnnl as this dry deposition issue seems affecting aerosols too.

wlin7 commented 1 year ago

Good news, finally.

@bishtgautam , in the spirit of your suggested changes, which make perfect sense to me assuming both rssun and rssha could have 0 values, I made the following adjustments in the two else-if blocks

if (rssun(pi) > 0._r8 .and. rssha(pi) > 0._r8) then
   rs = 1.0_r8 / ( fsun(pi)/rssun(pi) + (1.0_r8 - fsun(pi))/rssha(pi) )  
else if (rssun(pi) > 0._r8 .and. rssha(pi) == 0._r8) then
   rs = fsun(pi)*rssun(pi) )  
else if (rssun(pi) == 0._r8 .and. rssha(pi) > 0._r8) then
   rs = (1.0_r8 - fsun(pi))*rssha(pi)  
else
   rs = 0._r8
endif

The updated else-if blocks are to simply account for sunlit and shaded resistance, respectively. They are part of the series formulation in original buggy calculations (see issue #4677).

With this, and removing the use of limiter in mo_exp_sol.F90 for base_sol and base_sol_rest, the model can run without problem, and no ozone loss.

wlin7 commented 1 year ago

Note that the limiter of 1.0E-30 in mo_exp_sol.F90 must have been too strong, if ever needed in order for debug mode to run. With the limiter in place, the above changes can only get the model to run 27 days. Note that with the original changes suggested by @bishtgautam , the model ran 15 days; with more brute change to set rs=0 whenever rssun=0, it ran 33 days.

tangq commented 1 year ago

@bishtgautam , out of curiosity when rs = 0, is the dry deposition large or small? If rs represents resistance, the dry dep flux would be large when rs=0. Right?

mingxuanwupnnl commented 1 year ago

@tangq thanks for including me in this discussion. I met this last year when I ran maint-2.0 with MOZART-MOSAIC. It is caused, as discussed above, by the changes in the bug fix (#4678 ). @wlin7 In CESM2, they fixed a few bugs in the dry deposition velocity calculation, not only the one we have in ELM. It would be good to incorporate those bug fixes in ELM, which should further helps a bit to increase surface ozone according to Louisa's CESM2 overview paper (https://agupubs.onlinelibrary.wiley.com/doi/epdf/10.1029/2019MS001882). Of course, due to the urgency of merging features to master, we can have a quick fix here and think of fixing other bugs in the near future.

wlin7 commented 1 year ago

Sounds good, @mingxuan. We can iterate on that. Are the other bug fixed all included in the one @bishtgautam pointed us to (https://github.com/ESCOMP/CTSM/commit/9e9900bf5c1725d9b1fb87a4d9c74260212034fb). That one does not including our current fix; and it seems to include more than just DryDepVelocity.F90. I think it is good idea to allow sometime to absorb those fixes.

bishtgautam commented 1 year ago

@bishtgautam , out of curiosity when rs = 0, is the dry deposition large or small? If rs represents resistance, the dry dep flux would be large when rs=0. Right?

@tangq Unfortunately, I don't have the relevant background related to dry deposition modeling to answer your question.

tangq commented 1 year ago

@mingxuanwupnnl , are there links to those CESM2 dry deposition velocity bug fixes besides the one @bishtgautam shared?

hwangacme commented 1 year ago

@bishtgautam , out of curiosity when rs = 0, is the dry deposition large or small? If rs represents resistance, the dry dep flux would be large when rs=0. Right?

@tangq Unfortunately, I don't have the relevant background related to dry deposition modeling to answer your question.

Qi, I think surface resistance, as one of the contributing terms, is proportional to dry deposition flux, so when rs=0, there is no contribution from surface resistance, but other terms can contribute to dry deposition.

mingxuanwupnnl commented 1 year ago

@wlin7 @tangq I think the one @bishtgautam shared is good. It also include the fix #4678 to some extent. Also, I think we need to set rs to 1e+36 for default for where the issue is, since it is the denominator. I need to take a deeper look into this.

tangq commented 1 year ago

@hwangacme , @mingxuanwupnnl , I reached out to Sam Silva, who confirmed that the rs variable represents the stomatal resistance and factors inversely into a dry deposition calculation. It is recommended to use rs = 1.e36_r8 where there are no sunlit or shaded leaves.

wlin7 commented 1 year ago

Setting rs = 1e36 when no sunlit/shaded leaves may be a more formal way. For record, just to note that previous zero rs value did not take effect except for index_pan and index_xpan (as seen in the codes), as immediately below that line, it sets rsmx=1e36 if rs=0.