ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
302 stars 307 forks source link

Implementing dynamic urban land cover in CESM #1445

Closed fang-bowen closed 1 year ago

fang-bowen commented 3 years ago

Scientific relevance

CESM has the urban representation that is sufficiently realistic to capture the urban climate dynamics. However, one limitation of the current model is that the urban land cover is static, which prevents CESM from capturing the urbanization effects on local urban climate.

In this project we propose to develop an urban surface dataset for future urbanization scenarios and to implement such dynamic urban land cover in CESM, which can serve as an option for future CLM versions. This enhancement to CESM will also enable researchers to supply their projection of land use change data and study the transient climate impact of LULCC.

Plan of implementation

  1. Preparation of urban land datasets

Based on a set of data-driven projections of global urban land expansion under the framework of the shared socioeconomic pathways (SSPs) by Gao and O'Neill (2020) , originally at 10-year intervals for the 21st century, we plan to develop the annual surface dataset for CESM.

This current dataset represents the urban expansion based on SSP5-8.5 scenario, and assumes vegetation and cropland are replaced by urban when urbanization happens. The dataset spans between 2015 and 2100 and is at 0.9 × 1.25° resolution. This dataset is used in our development, and we will create surface datasets based on other SSP scenarios or for historical periods (1850 - present) when we have the data available.

We plan to enhance the mksurfdata tool so that it can handle a series of urban land use grid datasets and generate the surface datasets for CLM automatically. This will be useful for the community to test any given urbanization or land use change scenario.

  1. Update to model code

We will modify code in CLM to implement physical processes to address water and energy balance when dynamic urban land use change is adopted, and deal with biophysical process changes in regions that are converted from vegetated to urban.

Currently CESM runs the entire simulation based on the initial land cover data. We plan to modify the code so that the model can update land cover each year with a new surface dataset.

We plan to test the code modification with active land components only, and then implement the change to fully-coupled CESM simulation.

Collaborators

Keer Zhang(@keerzhang1). Advisor: Dr Lei Zhao. We are working closely with Dr Keith Oleson (@olyson).

billsacks commented 3 years ago

Thanks a lot for opening this issue and describing your plans.

You may be happy to know that the framework is already in place for task 3 (as described in https://escomp.github.io/ctsm-docs/versions/master/html/tech_note/Transient_Landcover/CLM50_Tech_Note_Transient_Landcover.html#mass-and-energy-conservation), so this will probably only involve relatively minor changes, if any. The relevant module is

https://github.com/escomp/CTSM/blob/master/src/biogeophys/TotalWaterAndHeatMod.F90

You'll just need to confirm that all of the relevant urban water and energy states are captured there.

For task 2, you will want to add the pct urban fields to the existing transient landuse file, which already provides time-varying fields for other variables like pct crop. Then task 4 will involve writing a module similar to https://github.com/escomp/CTSM/blob/master/src/dyn_subgrid/dyncropFileMod.F90 or https://github.com/escomp/CTSM/blob/master/src/dyn_subgrid/dynlakeFileMod.F90 which reads the relevant urban fields off of the file each year. One minor note is that you should probably also call the CheckUrban routine on each year's input:

https://github.com/ESCOMP/CTSM/blob/84e970fe5125ab97b93e6937afaed4539ef2ac33/src/biogeophys/UrbanParamsType.F90#L705

One other question you'll want to think about is how to initialize water and energy state variables when an urban column newly comes into existence. You basically have two options:

I'd be happy to help consult on some of the specific code changes that need to be made for this project.

olyson commented 3 years ago

Thanks Bill, this was very useful information. We are meeting bi-weekly on this project and will contact you when we need help, thanks!

fang-bowen commented 3 years ago

Thanks Bill, that's very helpful! We'll update this issue with any progress.

olyson commented 3 years ago

@billsacks , I'm a bit confused by the PCT_CROP_MAX that you mentioned and similar variables that are intended to provide a reasonable mask field. We were going to do something similar for urban. I see those on the transient landcover dataset (created by mksurfdat) but I don't see where they are used in the clm code (unless it is a segmented field that doesn't allow for a search using grep). For lakes, it looks like the variable HASLAKE is read off the dataset if transient lakes are active, but I don't see a variable comparable to crops like, e.g., PCT_LAKE_MAX.

billsacks commented 3 years ago

@olyson sorry for the confusion. This is something that got half-way implemented and then dropped. It's still intended, but fell off the priority list (first mine, then @slevisconsulting 's): see https://github.com/ESCOMP/CTSM/issues/229.

billsacks commented 3 years ago

@olyson and I discussed BGC conservation by email and then in person today.

First, here's a copy of our email conversation so it's all in one place:

From Keith:

I'm implementing dynamic urban and am getting a methane conservation error the first time PCT_URBAN changes. It looks like the dynamic adjustments for conc_ch4_sat_col and conc_ch4_unsat_col are putting some non-zero values in for roof and walls. Then when the total column ch4 is summed over the soil layers (or in this case, urban layers), it is trying to sum over nlevsoi, not nlevurb, and thus dz is 1e36 for layers > nlevurb, which creates some very large total column ch4, and hence creates an imbalance. So I'm trying to find a way to restrict the dynamic adjustments to just nlevurb for roof and walls.

My reply:

I spent a little time looking at the relevant code tonight and I can see why there are problems here. It looks like I didn't add the special handling of urban that's needed in the dynamic landunit BGC conservation code. First, there is no accounting for the fact that there are different numbers of layers in some urban columns. But perhaps more fundamentally, it looks like I didn't account for the special urban column scaling that's needed in dyn_subgrid/dynColumnStateUpdaterMod.F90, or the fact that it doesn't make any sense to store carbon in urban walls.

The way things are done right now, when a special landunit grows, any carbon and nitrogen that was in the soil in vegetated landunits essentially gets "trapped" in the corresponding column-level layer structure of the special landunit that is taking its place. This only really matters if the special landunit shrinks in the future: by trapping the C & N under the special landunit, this C & N is eventually restored to the vegetated landunit if the special landunit shrinks in the future, assuring conservation of C & N as well as a reasonable starting state for the new vegetated landunit.

I think the right thing to do is a combination of two things: (1) Making sure we use the correct scalings of urban columns in the update_column_state routine in dynColumnStateUpdaterMod: This routine looks at the areas gained and lost by columns within each grid cell, and I have a feeling it isn't working right for urban columns. (2) Something like adding a new wrapper routine in this module that forces states to 0 for some of the urban columns. This shouldn't be too hard: there is already a (currently-unused) wrapper routine that facilitates forcing states to 0 for certain landunits, and we just need a similar routine that does something similar for certain urban columns, and then we'd need to change many / all of the current calls to use that wrapper instead of the "no_special_handling" version. (3) Possibly also changing ch4_totcolch4 to only count down to nlevurb in urban columns. This may be unnecessary with the other changes I'm proposing, but might be a good idea for robustness.

I'm not sure exactly what this would need to look like, but would be happy to put my head together with yours to work it out together. That said, it's possible that we could take some shortcuts, especially if it's pretty safe to assume that we wouldn't have urban areas growing then later shrinking within a given grid cell in a single simulation.

This differs somewhat from your suggestion of restricting the adjustments to just go down to nlevurb. That might be an additional right thing to do, but I'm thinking that we might want to prevent carbon from being "buried" in these special urban columns at all, in which case I think the special level handling would be unimportant. One thing I'm not sure about, though, is: If it is important to conserve C & N across urban transitions, then I think we'd want to store C & N beneath both roads and roofs. Roads should be fine (because they use nlevsoi/nlevgrnd, right?), but I don't know how we can store all of the vegetation's C & N beneath roofs. Conceptually, I guess that would be C & N trapped in the soil under a building; storing that seems to require some structure that mirrors the nlevsoi structure on the roof column. So maybe we just need to live with non-conservation of C & N in urban transitions – and that seems scientifically justifiable if you assume that, under a building, the soil is excavated down to bedrock and so the C & N is lost from that column. (For the (rare?) case where urban areas shrink, we could actually easily specify some fixed non-zero C/N values that are restored in these roof-occupied columns if that's important scientifically.)

Finally, this makes me realize that someone (you?) should also give a critical eye to the biogeophysical dynamic landunit conservation code to make sure that's doing the right thing for urban columns. The relevant code is in TotalWaterAndHeatMod and dynConsBiogeophysMod (especially subroutines dyn_water_content and dyn_heat_content): It's worth making sure that the summation of columns and the averaging from column to gridcell level is being done right for urban columns. I'd be happy to talk more with you about that, too. That's actually probably more important to get right than the CN conservation.

And finally, a summary of our discussion today:

fang-bowen commented 2 years ago

Progress update:

During the past weeks we have made some progress on the following tasks:

Task 1 & 2: Earlier we have generated a surface dataset for code development and testing. The dataset is processed into landuse.timeseries file to be read by the new dynamic urban module.

We are looking at the mksurfdata_map tool and plan to automate this process for dataset generation.

Task 4: We have created a new module (dyn_subgrid/dynurbanFileMod.F90) and made other necessary modification so that the model is able to read PCT_URBAN in landuse.timeseries file and update the urban fractions annually. This is verified by comparing the output with the input data.

As for the initialization of new urban columns: we have created PCT_URBAN_MAX variable in our landuse.timeseries file, but since the functionality (of only initialize urban where it might come into existence) is not fully implemented, we are currently setting (user_nl_clm) run_zero_weight_urban to .true. and running urban everywhere. We plan to evaluate the performance of this setting.

Besides these, as Keith suggested I have made some flowcharts (calling tree/sequence) and found them helpful in understanding the code and the model structure. I will refine and share some relevant ones here soon.

Our priority for now:

Task 3: We are trying to understand how water and energy balance is handled for dynamic urban in the current code. Through some calculation we believe the scaling method (c2l_scale_type='urbanf') makes sense, and so far we have not identify any states / processes that seem incompatible with dynamic urban.

However Keith has a time-step level output (in /glade/scratch/oleson/archive/ctsm51_ctsm51d049_1deg_GSWP3V1_SSP585_urbanonly_bowen/lnd/hist, 2019-12 to 2020-01) and we found the difference between HEAT_CONTENT1 and HEAT_CONTENT2 after the urban fraction changes does not match the aggregated EFLX_DYNBAL throughout the later year.

So, @billsacks, do you have any insight what could be the reason here? I can make some plots to show this if it helps.

By the way, I see some routines to "set start-of-run baseline values" (e.g., dyn_hwcontent_set_baselines). Will this baseline subtracting method affect our dynbal fluxes here?

dynamic_urban_update_2021_09_27.pdf

billsacks commented 2 years ago

Good questions. The logic controlling this is pretty complex.

Regarding the mismatch between (HEAT_CONTENT2 - HEAT_CONTENT1) and EFLX_DYNBAL: sorry, I forgot about this point when we talked before: The delta heat is adjusted with this code before computing EFLX_DYNBAL:

https://github.com/ESCOMP/CTSM/blob/3dcbc7499a57904750a994672fc36b4221b9def5/src/dyn_subgrid/dynConsBiogeophysMod.F90#L557-L562

Briefly, the problem this addresses is that the runoff generated by the delta liquid term is already adding or subtracting some heat to/from the grid cell, so we need to correct the delta heat adjustment to account for that. I would be happy to try to explain this in more depth if you're interested – and also happy to hear any arguments that this correction shouldn't be done at all if anyone feels that way, because I'm still not 100% sure this is the right thing to do (but Inne Vanderkelen and I discussed this in depth a few years ago in the context of dynamic lakes, and it at least stood up to that inspection).

Regarding the baselines: No, I don't think this should have any impact on dynbal fluxes for urban, unless the urban area is growing into glacier or lake areas: These baseline values are used to account for "virtual" states in glacier and lake landunits, and I believe aren't used in other landunits (though I could be remembering wrong). This is another subtle thing that I'd be happy to discuss further if you'd like to understand it more, though I think you won't need to deal with this for urban.

olyson commented 2 years ago

Performance notes for 5 year I2000 cases: "allurban": 152 yrs/day, 289 pe-hr/simulated_year normal: 172 yrs/day, 254 pe-hr/simulated_year

So, about 12% slower when running 3 urban landunits in every gridcell.

Face2sea commented 2 years ago

@billsacks Hi Bill, I think we have verified that the heat, liquid water, and ice conservation are all working properly. Thanks for your help! We are now planning to move forward with creating the multi-scenario (SSP-RCP) dynamic urban landuse.timeseries data based on Gao and O'Neill (2020) dataset. Before we are doing so, do you have any suggestions that how we should prepare these datasets and codes to make them better comply with the overall data framework in CESM/CTSM.

olyson commented 2 years ago

I spoke to Bill about this. Basically, we should proceed with implementing the raw dynamic urban data into mksurfdata. As long as we use a recent version of ctsm master, any ongoing work on the mksurfdata process should not affect us much, if at all. We can discuss further at our next meeting.

olyson commented 2 years ago

@billsacks, we are wondering if it makes sense to issue a PR on our work thus far; task 4 (updating model code to read in and execute a landuse timeseries file with dynamic urban, and task 3 (ensuring water and energy conservation). The proposed PR would have transient urban off by default and thus would be non-answer-changing, but would allow us and others to begin running experiments with transient urban using some input datasets that have already been created.
The second PR would be issued once our ongoing work with implementing transient urban into the mksurfdata routines was complete.

billsacks commented 2 years ago

@olyson yes that sounds like a great plan.

olyson commented 2 years ago

Regarding implementing transient urban into the mksurfdata routines...the current method replaces bare soil with urban. If there is not enough bare soil, then both PCT_NATVEG and PCT_CROP are reduced proportionally to accommodate any excess urban area. Given the increasing importance of crops to food security in the future, we are contemplating changing this behavior so that urban replaces PCT_NATVEG first and then PCT_CROP only if necessary. @billsacks @ekluzek @lawrencepj1 @wwieder, any comments on this approach?

lawrencepj1 commented 2 years ago

Hi Keith @olyson. That makes total sense given that the PCT_NATVEG is currently the residual after all other land units are allocated. Is the intention to have the total PCT_URBAN (with urban classes) on the transient landuse.timeseries file? If so the decision about PCT_CROP could be made before run time.

billsacks commented 2 years ago

At first I thought this made sense. But as I read through the code & comments in normalizencheck_landuse in mksurfdat.F90, I'm starting to think that it's most correct to stick with the current approach, at least if you're talking about the surface dataset – the answer may be different for the landuse.timeseries file, which I think is what @lawrencepj1 was thinking about with his comment.

I'm looking at this comment:

https://github.com/ESCOMP/CTSM/blob/7a850247fd363abacec378159e400b9b4facabaf/tools/mksurfdata_map/src/mksurfdat.F90#L1410-L1416

Because the different PFTs / CFTs are expressed on the raw dataset in a way that adds to 100%, I think it makes sense to reduce everything proportionally. For example, if there is a grid cell that starts out as 50% c3 grass, 50% soybean and 50% urban: I think the meaning of the 50% c3 grass and 50% soybean is that 50% of the vegetated area is c3 grass and 50% of the vegetated area is soybean; this is in contrast to special landunits, where their percentage is specified as % of the gridcell area. That makes me think that the right thing to do is to reduce all PFTs / CFTs proportionally.

Also, I think we should be consistent between urban & other special landunits in this respect.

That said, I haven't looked closely at this. This could warrant more discussion.

olyson commented 2 years ago

Thanks @lawrencepj1 , yes we will have PCT_URBAN on the transient landuse.timeseries file. I see the following code in dynLandunitAreaMod.F90 that controls the priority of which landunits get decreased if they add up to more than 100%:

! This parameter specifies the order in which landunit areas are decreased when the
! specified areas add to greater than 100%. Landunits not listed here can never be
! decreased unless the forcings say they should be decreased. In particular, note
! that istice doesn't appear here, so that the istice area always will match
! the areas specified by GLC. In general, the code will NOT be robust if more than
! one landunit is excluded from this list. Meaning: since istice is excluded from
! this list, all other landunits should appear in this list!
integer, parameter :: decrease_order(7) = &
     (/istsoil, istcrop, isturb_md, isturb_hd, isturb_tbd, istwet, istdlak/)

So I guess this is essentially doing what we were proposing since first istsoil is reduced, then istcrop...

keerzhang1 commented 2 years ago

Thanks for the helpful discussions on mksurfdat tool!

I am modifying the mksurfdat tool based on your opinions. Just to confirm our current decisions: (1) As @billsacks said, we should not change the current normalizedcheck_landuse because the PFTs/CFTs are expressed in the raw data in a way that adds to 100% (but the normalizedcheck_landuse is able to first reduce bare soil, which is the index 0 of the natural pft array. I am still puzzled why we cannot similarly reduce other natural pfts first, and then reduce the cfts?).

(2) Then, as @lawrencepj1 said, the dynLandunitAreaMode.F90 will reduce the istsoil first, and then istcrop when the sum of all land units is larger than 100. But if we do (1), then essentially we have already reduced the crop and natveg proportionally when the urban expands, because the normalizedcheck_landuse ensures the sum of all land units to be 100.

I made a slide to show the current workflow and my thoughts:1.pdf If my understandings are correct - doing (1) and (2) still reduce the crop and natveg proportionally when urban expands, right?

keerzhang1 commented 2 years ago

I wonder if we can do the following to make the expanding urban replace PCT_NATVEG first and then PCT_CROP:

(1) In mksurfdat.F90, normally the normalizedcheck_landuse will ensure the lake[1]+wet[1]+glacier[1]+urban[y]+natveg[y]+crop[y] to be 100. We modify the normalizedcheck_landuse so that it adjusts the natveg and crop assuming the lake,wetland, glacier, and urban all remain constant as the first year. Hence, it ensures lake[1]+wet[1]+glacier[1]+urban[1]+natveg[y]+crop[y] to be 100. Also, the normalizedcheck_landuse still replace bare soil with urban first, and then the natveg and crop proportionally.

https://github.com/ESCOMP/CTSM/blob/7a850247fd363abacec378159e400b9b4facabaf/tools/mksurfdata_map/src/mksurfdat.F90#L1191

(2) In the landuse.timseries data, we still generate the PCT_URBN which is the transient urban fractions (urban[y]).

(3) Then in the dynLandunitAreaMode.F90, the Landunit_sum might be >100 when urban expands, and this expanded urban will first take the istsoil, and then istcrop.

I also made a slide to show this workflow 2.pdf. Does this make sense? Thank you!

olyson commented 2 years ago

Lei, can you propose a meeting for us to discuss this further? I'd suggest inviting @lawrencepj1 and maybe also @billsacks as well. I don't know enough about the history behind some of the choices made in mksurfdata to judge whether these proposed changes are reasonable.

billsacks commented 2 years ago

@keerzhang1 thank you very much for all of your thoughts on this. I really appreciate that you are thinking through this carefully. I agree with @olyson that a meeting makes sense at this point (partly to discuss your recent proposals, which I don't fully understand, and partly to discuss this issue more generally), though I'll share some more thoughts in writing first:

I have spent some more time looking through the relevant code and thinking about this issue. It does look like PCT_NATPFT + PCT_CROP = 100% is generally true prior to normalizencheck_landuse. I am left with more questions than answers at the moment, though I am starting to realize that my last comment (about reducing PCT_NATPFT and PCT_CROP proportionally) may not have made sense.

I think the important point here is what's done to create the raw datasets. I'm hoping that @lawrencepj1 can provide some insight here. At first I thought that urban should be treated the same as other special landunits. But now I'm realizing that what's really important is keeping the mksurfdat code consistent with the process for creating the raw datasets, and specifically the process for rescaling PCT_NATPFT and PCT_CROP to sum to 100% on the raw datasets. The current mksurfdat code, where urban is treated differently from other special landunits in that it preferentially replaces bare ground, makes sense if urban is treated differently from other special landunits in the creation of the raw datasets – and specifically, if urban areas are represented as bare ground PFTs on the raw datasets.

It would help me to know how various situations are represented on the raw datasets; for example, these three grid cells:

Grid cell 1: in reality:

Grid cell 2: in reality:

Grid cell 3: in reality:

I'd particularly like to know what PCT_NATPFT, PCT_CROP and PCT_PFT look like on the raw datasets in each of these cases. It seems like the important point is making sure that the mksurfdat code is set up in a way to reproduce reality in each of these situations (and others) given the process for creating the raw datasets.

I do see the point that, for transient, it's a problem if we reduce PCT_CROP on the landuse timeseries file when urban expands. It seems like this could be an issue for other landunits, too – like the upcoming feature to enable transient lakes. This might argue for preferentially taking all special landunit area from PCT_NATVEG, but that might in turn require a change in how the raw datasets are constructed to keep everything consistent.

This may actually require some even bigger rethinking, like:

One other comment is that, whatever we decide here, if there are behavior changes to the standard operation of mksurfdat, those should be done in their own branch/PR, not folded into the big transient urban PRs: we want to keep answer changes separate from answer-preserving feature additions. If it is easiest for you to fold all of these things together initially, then we should make sure we have a plan for separating them later (whether manually or with the help of git).

Face2sea commented 2 years ago

@keerzhang1 @billsacks @lawrencepj1 @olyson Thank you all very much for these helpful information! @olyson yes, it is definitely a great idea to have a meeting to discuss these issues! I will create a doodle for everyone to find out a time shortly.

lawrencepj1 commented 2 years ago

Hi Everyone @olyson @keerzhang1 @billsacks

Yes I agree a meeting would be good. In regard to Bill's comments on the raw datasets we currently have a PCT_URBAN specified but this is from MODIS and does not have the required breakdown into the urban classes. It could easily be included in the next version of the CLM raw data so that the nat veg, crop and other fractions are accounted for before mksurfdata as an alternative strategy if that was helpful.

cheers Peter

olyson commented 2 years ago

As Peter implies, the original raw multi-density urban dataset was independently derived from considerations (mainly population density) other than, for example, remotely-sensed data such as from MODIS. At the time of its implementation in mksurfdata, it was assumed that urban should replace bare ground preferentially because of their spectral similarity in remote sensing derived datasets of that period.

Face2sea commented 2 years ago

@olyson @billsacks @lawrencepj1 @keerzhang1 @fang-bowen Here is the when2meet link for our meeting to discuss this further: https://www.when2meet.com/?13594604-RvqkA. Could you please indicate your availabilities in the link? I will find out a common time for everybody to set up the meeting. Thanks!

billsacks commented 2 years ago

As we discussed on Monday: For now @keerzhang1 will do something simple in mksurfdata_map, getting dynamic urban to work without trying to address the issues identified above. However, before too long, we should overhaul how the PCT areas are determined in mksurfdata_map. I have opened a new issue for this: #1555 .

@keerzhang1 @Face2sea and @fang-bowen : thank you very much for bringing these issues to our attention and starting these important conversations around what is the right thing to do in mksurfdata_map!

billsacks commented 2 years ago

See also #1572

olyson commented 2 years ago

Tasks 2, 3, and 4 have essentially been completed. Task 5 (testing) has also been completed but will need to be repeated once the items below have been completed. Task 1 has been expanded below to include a new building temperature stream file and code to determine nlevurb from the surface datasets as originally described in PR #591 which was completed but never implemented into master.

1a. Incorporate new raw urban surface datasets for present day (2000), historical, and five SSPs (SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP5-8.5) developed primarily by @keerzhang1 from Gao and O'Neill (2021) and Gao and Pesaresi (2022). These datasets are available but have PCT_URBAN calculated with respect to the land area of the gridcell and will likely need to be replaced with PCT_URBAN calculated with respect to the total area of the gridcell based on recent discussions. The urban properties (morphological, radiative, and thermal) in these datasets are described in Oleson and Feddema (2019).

1b. A new building temperature stream file. It is available in inputdata at /glade/p/cesmdata/cseg/inputdata/lnd/clm2/urbandata/CLM50_tbuildmax_Oleson_2016_0.9x1.25_simyr1849-2106_c181017.nc, but needs to be converted to cdf5.

1c. Code changes in CTSM and mksurfdata (presumably mksurfdata_esmf) to read in nlevurb from the surface dataset instead of being hardcoded in clm_varpar.F90. This will ensure backward compatibility with older urban datasets (which have nlevurb=5 as opposed to the new datasets which have nlevurb=10).

The plan is to implement 1a-c on a branch off of ctsm5.2.mksurdata.

slevis-lmwg commented 1 year ago

@olyson #1853 is now merged (in the ctsm5.2.mksurfdata branch). Feel free to close this issue if appropriate.

olyson commented 1 year ago

Thanks @slevisconsulting ! I'm going to close this because the tasks have been completed and I assume that the final surface and landuse timeseries datasets will be generated and incorporated into the namelists using the ctsm5.2.mksurfdata branch and then merged into main. Any objections?