Closed quantheory closed 7 months ago
This PR passes the e3sm_developer
tests. I am running the super-BFB tests just in case. Edit: They all passed except for one that is also failing for me on master (#5851). This tag should be BFB for compsets with no active atmosphere (e.g. I compsets); this seems to be the case but I've only checked a couple of configurations.
With this PR, it is possible to isolate the effects of each of the four main changes individually by comparing different configurations of this code to different configurations of the original. Consider running the same compset with active atmosphere in each of the four code/configuration combinations below:
master
with all default settings.master
with xmlchange ATM_SUPPLIES_GUSTINESS=TRUE
.force_land_gustiness=.true.
and U10WITHGUSTS
added to the EAM output.U10WITHGUSTS
is added to the EAM output.Then the fix for #5834 can by isolated by comparing 1 and 2, the fix for #5835 can be isolated by comparing 2 and 3, the fix for #5836 can be isolated by comparing the U10
and U10WITHGUSTS
outputs within either 3 or 4, and the fix for #5837 can be isolated by comparing 3 and 4.
That said, I don't know if we would learn that much from examining configuration 2 in that much detail, except for debugging or sanity check purposes. To figure out what this does to the climate, 1 and 4 are obviously of the most significance, so at least those two need to be run out for some time with the coupled model, with configurations 2 and 3 maybe only needing to be run out for a few years as a sanity check. For all of these cases, the effect on U10 and the surface fluxes is probably large enough that it won't take all that much time to see what the effect is at least qualitatively.
Note that in the meeting last week we did not really prioritize a fix for the double-counting issue (#5837), but it was not too bad to add today.
@mabrunke Please double-check this. Particularly clubb_intr.F90
, the ELM BareGroundFluxesMod.F90
and similar files, and shr_flux_mod
changes. Also, I made the U10 change in cam_diagnostics
.
@huiwanpnnl I made a change to the conditional_diag_main
module, so that the U10 calculated there would match the one in the cam_diagnostics
module.
@bishtgautam The atmospheric gustiness should be handled correctly in lnd_import_export.F90
, but the corresponding changes have not been made in lnd_downscale_atm_forcing.F90
(or lnd_disagg_forc.F90
, which seems to be unused code). Can you let me know whether this is supposed to be a purely offline downscaling for use with datm
, or if this is supposed to work in an "online" manner when EAM is running? If the latter, we need to change that module as well.
@wlin7 @rljacob Tagging you in case you also want to take a look at this.
@quantheory, I have some code changes already for lnd_import_export.F90
. That is used no matter whether ELM is used offline or coupled to the atmosphere.
@mabrunke The specific reason why I'm concerned about lnd_import_export.F90
is these lines of code, which explicitly prohibit combining downscaling and atm_gustiness
. I don't remember why that check is there. It may be that I added it because I just wasn't sure what to do with the gustiness field if downscaling is being used. Though perhaps it's as simple as just copying what's being done for ubot
and vbot
.
@quantheory wind speed is not being downscaled at this time; it is just being copied to all of the topounits. I believe windbot
is only used as a diagnostic in the ELM output as what it sees the bottom-layer wind speed to be. Without downscaling, gustiness has been included in that output variable.
@mabrunke I guess what I'm trying to figure out is that we have these three places where that copying seems to be being done, and I'm wondering if I need to do the same thing for ugust
as is being done for ubot
and vbot
in those lines. (As opposed to the current behavior, where if atm_gustiness
is on, the code simply aborts before calling that function!)
@quantheory, can you provide the line of code where that abort happens. I'm not seeing it. Is it in lnd_import_export.F90
?
@mabrunke Yeah, right here a few lines before the call to the downscaling would otherwise happen:
@quantheory Ah, I see it now. I don't see why applying gustiness if there is downscaling would be a problem at this point, since wind speed is being downscaled right now.
@mabrunke OK, I can add another commit for to allow applying gustiness with the downscaling. Do you think this PR looks good other than that?
@quantheory A few things: I like your idea of adjusting U10 in EAM when its being outputted there, but U10 is also outputted in ELM. It should be adjusted for that output as well. Also, you changed the surface flux calculation in CICE, but isn't MPAS-SeaIce now being used in v2/3?
@mabrunke Those are both very good points. Regarding the U10 output, if I need to calculate the change in multiple components anyway, now I'm kind of considering going back to calculating it at the surface rather than in EAM. (U10 is supposed to be measured relative to the surface right? I.e. a measure of u_atm - u_ocn over the ocean, rather than just the atmosphere speed in an Eulerian frame? That seems to be how it works in shr_flux_mod.F90
.)
It shouldn't be difficult to make the changes to MPAS; I really just forgot because the original ATM_SUPPLIES_GUSTINESS changes were done back in v1, and so my list of files to change had CICE but not MPAS ice on it. Let me see what I can work out today.
@quantheory, truly the fluxes should be calculated using u_atm - u_sfc as well as the temperature and humidity differences, but, over land, u_sfc = 0. That is not the case over the ocean, as there is a surface current.
I was handling the adjustment in the surface models. That was why I suggested making ATM_SUPPLIES_GUSTINESS = TRUE. That would complicate things if you want to save the two different versions of U10 like you are now. I'm not sure yet if that's a good idea, since the old way of calculating U_gust was wrong anyways.
There is also a sea ice formulation in shr_flux_mod.F90
. I'm guessing that that is a relic of an older version of the model and is no longer used.
Why is CICE being discussed in this thread? E3SM does not use CICE. But any changes to U10 will also affect ocean and sea ice coupling in B-cases if it is being changed in the coupling, rather than just output, and if such changes are being proposed, people from the ice and ocean groups need to be assigned as reviewers. If the changes only affect F-cases, then MPAS-SeaIce is still involved, and the CICE tag should be removed, and in its place sea ice or MPAS-SeaIce should be tagged.
@proteanplanet
Why is CICE being discussed in this thread? E3SM does not use CICE.
Because it's building on changes that were originally developed in E3SMv1, and because this is the first PR I've made affecting sea ice in several years (partly due to not working directly on E3SM at all for a while). So as I mentioned earlier, I forgot that I needed to port the changes over to MPAS.
people from the ice and ocean groups need to be assigned as reviewers
I agree, but being rather atmosphere-focused, I don't actually know who to ask to review it. Also, given the problems mentioned in the above discussion, I need to make some changes before getting anyone to review this again anyway.
I was planning to ask @wlin7, but if there's anyone you want to suggest someone as a reviewer (or if you feel you should do it yourself), feel free. Though again, I need to port the changes to MPAS before it's worth bothering.
sea ice or MPAS-SeaIce should be tagged
There is no generic sea ice tag. The tag list seems a bit inconsistent in this regard, honestly. I had no idea whether to tag mpas-ocean or not, because I'm not touching the MPAS-ocean code at all, but I am making changes to coupler code that calculates atmosphere-ocean interactions. But there's no generic "Ocean" tag, only "mpas-ocean". My hope was that tagging Coupled Model would be enough.
@quantheory I suggest either making me or @eclare108213 a reviewer. Thanks for adding the mpas-seaice tag.
@quantheory, the surface fluxes are calculated in column/ice_atmo.F90
in MPAS-SeaIce. It doesn't use ugust
, so it must expect to receive the gustiness in vmag
.
@mabrunke Yeah, I noticed earlier, it looks like no one touched anything related to the gustiness export code since I first added it, so I have to port it from CICE today. Luckily this code has shared ancestry with the CICE version, so it should be easy to copy over once I've read through the MPAS version.
@quantheory, if you intend this work to make it onto E3SM master, please add a sea ice modeler to the review as per the above message - choose any of @eclare108213, @darincomeau, @njeffery, @erinethomas or me. If this is for your own project and you are using it solely for that, that's fine. If you intend this work to enter E3SM master, the changes must first be made to MPAS-SeaIce, not CICE. Moreover, the changes should be made to Icepack not to the MPAS-SeaIce column package, because Icepack from the CICE Consortium is replacing that for E3SM V3. Therefore your changes would need to pass review in the CICE Consortium Icepack code at https://github.com/CICE-Consortium/Icepack. I am happy to help in testing the code in E3SM, and welcome improvements, but we will need to make sure it is thoroughly tested in D-, G- and B-cases first.
@proteanplanet As I mentioned before, please wait for additional commits before reviewing this (I will tag you when this is done). I did not add you as a reviewer before, because I didn't think it was worth your time to look at the code when several significant changes were about to be made anyway. I have been out of the office part of this week, so I haven't been very fast about this.
I think there is some misunderstanding about why I made this PR. My expectation is that once this PR is reviewed for code correctness, it will be referred to the coupled model group and relevant component leads to determine how to evaluate this using coupled model runs, and determine what to do from there. No one involved has any intention of trying to get this merged unless and until that process is complete. I tagged this "Do Not Merge" for a reason.
Furthermore, I am not a part of the E3SM project, and it is not my responsibility (or in fact, within my ability) to do all of the necessary climate evaluation, or the integration with v3 features. I put together these bug fixes because I believe we all share an interest in ensuring that the model code is a correct implementation of the physical equations it represents, and as a courtesy to other E3SM developers to not have to start from scratch when implementing these fixes for v3. I have neither the knowledge nor the time to shepherd this PR through merging to master and/or integration with v3 by myself. That will require the assistance of E3SM developers in the coupled model and other groups and the help of the integrators, so rest assured that you will hear much more about this, and there will be several more rounds of review, before anyone tries to get this merged.
@quantheory Thanks. I'll help with the testing when you are ready.
My expectation is that once this PR is reviewed for code correctness, it will be referred to the coupled model group and relevant component leads to determine how to evaluate this using coupled model runs, and determine what to do from there. No one involved has any intention of trying to get this merged unless and until that process is complete. I tagged this "Do Not Merge" for a reason.
@quantheory Thanks for reaching out. I'm glad you're planning to work with the coupled group and the relevant component leads, since the implementation needs to be consistent across all of them and sometimes that can be pretty tricky. Please change this to a Draft PR to make it clear that it's only for review purposes, not staged for merging into E3SM master. To get this all the way through the review process into E3SM, you'll need to follow the steps outlined here. Do you have a design document for these changes?
@eclare108213
Please change this to a Draft PR to make it clear that it's only for review purposes, not staged for merging into E3SM master.
Ah, interesting! I don't think I've used that feature before. Done.
Do you have a design document for these changes?
Yes and no. The original gustiness feature, developed over a few years and completed in 2019, seems to have a design document here. This PR is a collection of bug fixes for that older feature, not a new feature in itself.
Part of the problem is that the original implementation tried to handle gustiness entirely within the atmosphere, without modifying the land or sea ice components or sending any "new" information through the coupler. (I assume this was in part to reduce the complexity of the required code modifications.) This approach neglects some changes that really need to be made to the other components. Hence, fixing this feature requires reworking the design to add/change a few lines of code in the land and sea ice models, even though the original feature did not touch the ELM or CICE (or MPAS-Sea Ice) code directly.
@proteanplanet @eclare108213 The changes to MPAS-Seaice are now implemented. (I also kept the CICE changes because CICE is still used for some tests, e.g. of EAM's single-column model.) This is the first time I've done anything with the MPAS-Seaice code, so please help me out if I've made any basic mistakes here. I have not added any outputs related to gustiness to the default output, but you may wish to do so. When EAM is not being used, I've tried to use a gustiness value of 0 m/s, which should result in the same answers as on the current master.
There is some kind of stability issue with this branch in at least one WCYCL1850NS
test. I've filed issue #5884 about that, because it seems like this is a consequence of an issue on master
unrelated to this tag. But that appears to be primarily an atmosphere-land interaction issue, so whatever we do to resolve that is unlikely to have a significant effect on the MPAS-Seaice changes.
@bishtgautam @mabrunke @XubinZeng I have removed both atmosphere- and land-calculated gustiness (wc
) from U10 in this revision. The removal of wc
is the one change that I expect to affect answers without EAM running, but I believe it should only affect the U10 output of ELM.
status: wcycl1850NS test is failing. threading non-BFB issues.
status: unchanged from above. Also will need a B-case test.
This PR has nontrivial impact on the aerosol ERF (see table below in W/m2), though it is unclear to me yet what the exact pathway is. Nonetheless, the nontrivial impact isn't large enough to warrant fuller investigation.
Description | TOT | ARI | ACI |
---|---|---|---|
baseline | -0.736 | +0.096 | -0.926 |
PR 5850 | -0.707 | +0.101 | -0.905 |
One additional note on this PR from my testing: it is likely exacerbating (doubling) the machine-dependency (likely optimization-dependency) issue we've been tracking. I still don't recommend any action/investigation. Just a note for completeness.
@proteanplanet @eclare108213 Sea ice changes are now removed from this branch (except for 3 lines needed by the atmosphere for diagnostic purposes).
I have some results to share from the test simulations which were run after the removal of gustiness impacts on sea-ice. Here are the model vs model diagnostics from the F20TR simulation: https://portal.nersc.gov/cfs/e3sm/terai/E3SMv3_dev/e3sm_diags_Gustinessfix_vs_baseline_F20TR/viewer/ I have also run 30 years with the WCYCL1850 in trigrid format, to be compared with the v3alpha04 simulation: https://portal.nersc.gov/cfs/e3sm/terai/E3SMv3_dev/e3sm_diags_WCYCL1850_gustinessfix_vs_v3alpha04_231017/viewer/ MPAS diagnostics from the 30 years with same simulation are here: https://portal.nersc.gov/cfs/e3sm/terai/E3SMv3_dev/20231011.piControl.Gustinessfix.v3alpha04_trigrid.pm-cpu_intel/mpas_analysis/ts_0001-0030_climo_0001-0030/ocean/index.html
Summary
What's concerning with the coupled simulation is a crash that the coupled model encountered 38 years into the simulation with the following error:
/pscratch/sd/t/terai/E3SMv3_dev/20231011.piControl.Gustinessfix.v3alpha04_trigrid.pm-cpu_intel/run/e3sm.log.17098756.231016-200022
256: PIO: FATAL ERROR: Aborting... An error occured, Writing variables (number of variables = 181) to file (./20231011.piControl.Gustinessfix.v3alpha04_trigrid.pm-cpu_intel.elm.h0.0038-02.nc, ncid=469) using PIO_IOTYPE_PNETCDF iotype failed. Non blocking write for variable (SNOWDP, varid=157) failed (Number of subarray requests/regions=1, Size of data local to this process = 64800). NetCDF: Numeric conversion not representable (err=-60). Aborting since the error handler was set to PIO_INTERNAL_ERROR... (/pscratch/sd/t/terai/E3SM_code/quantheoryE3SM/externals/scorpio/src/clib/pio_darray_int.c: 395)
Before this, I also see the following error messages. I haven't seen them before, and don't know if these are to be expected:
1871: sl_advection: reconstruct_and_limit_dp (rank,ie) returned 1871 1-3.1438E-01
2584: sl_advection: reconstruct_and_limit_dp (rank,ie) returned 2584 1-1.6182E-01
1988: sl_advection: reconstruct_and_limit_dp (rank,ie) returned 1988 1-1.2616E+00
1871: sl_advection: reconstruct_and_limit_dp (rank,ie) returned 1871 1-4.9239E-01
2985: sl_advection: reconstruct_and_limit_dp (rank,ie) returned 2985 1-1.7805E-01
2986: sl_advection: reconstruct_and_limit_dp (rank,ie) returned 2986 1-6.9487E-01
@xuezhengllnl , @ambrad, @wlin7 do you have any thoughts on what might be going wrong and whether I should be concerned with the sl_advection
warning lines?
Run script is here:
https://github.com/E3SM-Project/SimulationScripts/blob/master/archive/NGDAtmPhysIntegration/run.20231011.WCYCL1850.Gustinessfix.v3alpha04_wVBSfix.trigrid.pm-cpu.sh
concerned with the sl_advection warning lines
Those lines are in general fine, although if you see a lot of them, that indicates instability in the model. Note that these lines are printed only in debug builds, specifically if NDEBUG Is not defined in the build. Am I correct that you're using a debug build?
Hi Chris (Terai), I don't know what might be wrong. a couple of other comments: a) from your results, I see the slight decrease of aerosol emission from oceans (e.g., sea salt) - which is consistent with what we expect. this might have a small effect on aerosol radiative forcing.
b) if possible, could you compare the current simulation (without gustiness impact over sea ice) versus the 40-year coupled simulation you did before (including gustiness impact over sea ice). then share the plots with us. from this comparison, we may gain some insights. thanks, Xubin
Those lines are in general fine, although if you see a lot of them, that indicates instability in the model.
There are a good number of these errors and they appear decades earlier as well in the simulation. So far the mean-state diagnostics don't indicate any instabilities, but things might be occurring at shorter frequency.
Am I correct that you're using a debug build?
I didn't intend to use a debug build, and I don't see that I did a debug_compile, so I don't know why they are printing if they are only intended in debug builds.
Adding @beharrop as a reviewer, since he'd been involved in the original implementation of the gustiness parameterization.
@crterai what code version are you using? I'm wondering if recent cmake changes have affected flags like -DNDEBUG. Thanks.
By the way, the debug output isn't an error but just a message about a limiter being active. Generally I expect the limiter to have to act about once over the sphere every few time steps, almost always in the top-most layer. But I do intend this message to appear only in debug builds; you shouldn't be seeing this message at all.
Edit: Just spoke with the CMake guru, who is making a bunch of changes to the build system, and indeed -DNDEBUG was accidentally removed from release builds in recent master versions. He's working on a fix now.
For the water cycle runs, I just used the PR branch, which built on top of 732e699695c67023880cb78937e21b5e66072879
of master, but it sounds like you've identified why it was printing those messages.
And thanks for the info about the sl_advection
messages not being error messages. I'm writing a restart close the crash and hopefully learn a bit more from extra output/checking flags.
@XubinZeng - Thanks for your input. I think we have a slight increase in the emissions of sea-salt aerosols (please see table below). We do see a decrease in the dust emissions though, which is consistent with not double counting the gustiness over land. https://portal.nersc.gov/cfs/e3sm/terai/E3SMv3_dev/e3sm_diags_Gustinessfix_vs_baseline_F20TR/viewer/table/20231011.v3alpha04.F20TR.GustinessFix.v3alpha04_wSOAGfix.pm-cpu_intel-ANN-budget-table.html
As for the comparison with the earlier WCYCL1850 simulation, the base simulation that they are based off of are different (v3alpha02 vs v3alpha04), so I'm wary of comparing the two simulations and attributing any differences to the changes in whether the sea-ice gustiness was included or not.
@crterai - thanks for the reply. agree on both points.
@proteanplanet - would you review the changes to the code changes that shoul now have removed the impact of the gustiness on the sea-ice?
@quantheory - I think we are close to finalizing this PR for merging. Would you remove the 'Draft' label to this PR?
Thanks for the reviews, @beharrop and @proteanplanet.
An update on the coupled simulation crash at the end of 0038-02
due to Non blocking write for variable (SNOWDP, varid=157) failed
:
17188180
, 17213876
), so it doesn’t appear to be a hardware issue.state_debug_checks = .true.
to catch any odd values in the atmosphere and the checker didn't catch anything. 0038-02-28
, writing high frequency output and expecting it to crash, but it completed the full day, getting past the crash point. My suspicion is that the bad SNOWDP data was written earlier in the month, but didn't crash when a bad SNOWDP value was produced and the simulation crashed only while writing the elm.h0
file. And I suspect that the branch simulation just produced an h0
file from the average values from 0038-02-28
, so it didn't crash. /pscratch/sd/t/terai/E3SMv3_dev/20231022.piControl.Gustinessfix.v3alpha04_trigrid.pm-cpu_intel.debuggingCrash/run
0038-01-01
with daily SNOWDP
output to catch where the faulty SNOWDP
is actually written. @crterai do you have more than 30 years of simulation? I thought climate changing simulations required 100 years? Through 30 years the ocean seems mostly unchanged however I noted that OHC anomalies seem to be stronger in the gustiness run so would be useful to see a longer run.
@vanroekel - we have up to 37 years of the water cycle simulation before it ran into a model crash while writing a diagnostic output variable (SNOWDP
). I'm guessing 7 more years will not be enough for an assessment?
I am working with the developers to try to get to the bottom of why it is crashing and will update here once we have a fix and longer (50+ year) coupled simulation. Do you notice any red flags from a coding perspective?
Thanks @crterai I did look through the code and the only modification to the ocean seems to be changes to the wind stress we receive in the ocean so not really directly affecting the component like as it did the sea ice. As for the length, 37 years is not much better than 30. I do think we should push for 100 as that is the code review policy, but will let @golaz chime in there. If Chris G is okay with a shorter testing simulation and I'm out on travel feel free to remove me as a reviewer and proceed.
update: runtime fails have gone away. Should be ready for alpha05.
This is designed to fix the following issues:
5834
5835
5836
5837
The first of these was fixed by forcing ATM_SUPPLIES_GUSTINESS=TRUE for all EAM cases, and the second was fixed purely by rewriting the Fortran formulas. These two changes are climate changing, and will generally tend to reduce the impact of gustiness on the mean winds, i.e. mean wind speed will increase.
The third fix is to the U10 diagnostic only, and will decrease the output U10. A value that has not been "fixed" in this way can be obtained by outputting the new history variable 'U10WITHGUSTS' from EAM.
The final fix is to remove some double-counting between gustiness parameterizations in EAM and ELM. The fix can be "undone" by setting
force_land_gustiness = .true.
in the ELM namelist.Note that these fixes are only applied over land and ocean. At the request of sea ice developers, this tag also disables gustiness over sea ice completely.
Fixes #5834 Fixes #5835 Fixes #5836 Fixes #5837
[CC] [NML]