ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
301 stars 306 forks source link

Remake PCT_PFT raw datasets to NOT force landunit area to 100%, and some other fixes needed to PCT_PFT raw datasets #1556

Closed billsacks closed 4 months ago

billsacks commented 2 years ago

This is connected with #1555 (and I realized this issue while investigating the mksurfdata_map code related to #1555 ) but this is a somewhat separate issue.

Currently, my understanding is that the PCT_PFT raw datasets are constructed so that the total landunit area of natural veg plus crop adds up to 100%: in source grid cells where the original data specify less than 100% cover of vegetation, PCT_PFT values are increased so that the raw data coming in to mksurfdata_map will specify 100% cover of PCT_NATVEG + PCT_CROP.

But as far as I can tell from an initial read through of mksurfdata_map, there is no reason why this needs to be the case, and I realized that there is a problem with this: In this call:

https://github.com/ESCOMP/CTSM/blob/7a850247fd363abacec378159e400b9b4facabaf/tools/mksurfdata_map/src/mkpftMod.F90#L561-L564

I'm pretty sure that the source (and dest) weighting imply that PCT_NATVEG (i.e., the landunit area) is used as a weighting factor in remapping PCT_NAT_PFT from source to dest. That seems like the right thing to do, but the correction of the raw datasets to force vegetated areas to 100% makes this weighting factor not work the way it should.

Consider a destination grid cell that has two source grid cells, with equal coverage:

The correct result would be a grid cell with 50.5% natural veg, 49.5% lake, where the natural veg is almost entirely c3 grass (with just a tiny bit of bare ground). However, because the PCT_PFT raw dataset is adjusted to give 100% area in Source 2, the current result in the code would be 50% c3 grass and 50% bare ground.

To get correct PCT_NAT_PFT mappings, I believe the raw datasets should be constructed without forcing the total veg area to add to 100%.

See the bold note at the bottom of https://github.com/ESCOMP/CTSM/issues/1555#issue-1056687092 for something to check in the course of doing this. I believe it's safe to make the changes in this issue before or after #1555, though if I had to choose an order, I would do #1555 before updating the raw datasets, because there are less likely to be issues with assuming that PCT_NATVEG + PCT_CROP = 100% once we do #1555.


Edit: here is the bold note that I was referring to, just so it's all in one place:

When making these changes, we should double-check that there isn't any code that assumes that PCT_NATVEG + PCT_CROP adds up to 100% (or forces that to be the case). I didn't see anything like this in my initial, quick read-through, but we should double-check that. One specific thing to look for is: will code work correctly if the input PCT_NATVEG = PCT_CROP = 0%? (Or would such grid cells incorrectly be considered ocean, for example?)

lawrencepj1 commented 2 years ago

Hi @billsacks

Yes this may be the case. The forcing to 100% Crop + PCT_NATVEG to fill in lakes, glaciers, wetland and urban on the raw data was historical going back through all versions used with mksurf until at least CLM4. I am traveling back from AGU today but can have a more detailed look on Monday.

best Peter

billsacks commented 2 years ago

As discussed in #1716 we will want the landunit areas to be specified as percent of the total grid cell area (NOT percent of the land area in that grid cell).

wwieder commented 2 years ago

@lawrencepj1 can we address this now that there is a CTSM 5.2 branch and decide if / how to address this issue?

lawrencepj1 commented 2 years ago

@wwieder and @billsacks Yes this is simple to do in the CTSM 5.2 raw data. We can simply scale all the land unit component percentages by landfrac. This will require the mksurfdata tool be changed so the requirement for the land units to add up to 100% will need to be removed. The other option is for the remaining difference in land units to be allocated to percent wetland in the raw data. Either way this would be a simple fix.

billsacks commented 2 years ago

@lawrencepj1 landfrac may come into this somewhat, but I think the main issue is the scaling based on special landunit (non-vegetated) areas in each source grid cell. We should probably talk to make sure we're on the same page about this before you make any changes.

wwieder commented 2 years ago

maybe a quick meeting would be helpful to discuss? I'll set up a 30 min slot immediately after next weeks SE meeting. Currently invited @dlawrenncar @billsacks and @lawrencepj1 to join. Should others be involved?

lawrencepj1 commented 2 years ago

Thanks Will

Sounds good.

Peter -- Dr Peter Lawrence Terrestrial Science Section National Center for Atmospheric Research 1850 Table Mesa Drive Boulder Colorado 80305

Work: 1-303-497-1727 Cell: 1-303-956-6932

On Thu, Aug 25, 2022 at 2:19 PM will wieder @.***> wrote:

maybe a quick meeting would be helpful to discuss? I'll set up a 30 min slot immediately after next weeks SE meeting. Currently invited @dlawrenncar https://github.com/dlawrenncar @billsacks https://github.com/billsacks and @lawrencepj1 https://github.com/lawrencepj1 to join. Should others be involved?

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1556#issuecomment-1227719737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3OJOMUK2BGN4FZFUVYV3LV27IO3ANCNFSM5IIFMPTQ . You are receiving this because you were mentioned.Message ID: @.***>

ekluzek commented 2 years ago

Invite @slevisconsulting as well.

wwieder commented 2 years ago

@slevisconsulting are you available for a meeting Sept 1 @ 11:00? Otherwise we can push one week.

ekluzek commented 2 years ago

@wwieder it looks like he is out until the 2nd. So most likely you should move it out a week.

slevis-lmwg commented 1 year ago

From an email from Peter: The new historical data for 1850 - 2015 with the new absolute values is in the directory: /glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918

Peter tested with an old version of mksurfdata_map. I will test with my copy of mksurfdata_esmf.

slevis-lmwg commented 1 year ago

@wwieder and @billsacks Yes this is simple to do in the CTSM 5.2 raw data. We can simply scale all the land unit component percentages by landfrac. This will require the mksurfdata tool be changed so the requirement for the land units to add up to 100% will need to be removed. The other option is for the remaining difference in land units to be allocated to percent wetland in the raw data. Either way this would be a simple fix.

I have compared fsurdat files generated with the new and old raw data. I see diffs; however, this may be because I did not modify the code as you suggested @lawrencepj1 . I will look into that.

slevis-lmwg commented 1 year ago

From meeting with @lawrencepj1 this morning: Peter approved the diffs between fsurdat files that I generated with mksurfdata_esmf using the new and old raw data that he provided. Peter clarified that this specific change to the raw data was not the one that would require a code change to mksurfdata.

slevis-lmwg commented 1 year ago

The new historical data for 1850 - 2015 with the new absolute values is in the directory: /glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918

@lawrencepj1 two things... 1) For ctsm5.2 we will need updated files (like the 1850-2015 that you provided) for all the "transient" time periods, right? Unless we're waiting for other changes to the files associated with this or other github issues. Otherwise, we need files for 850-1849 and all the SSP/RCPs. 2) To get all the new files rimported to the repository, we will need to change permissions on all these files. (But let's worry about that when all the files are ready.)

billsacks commented 1 year ago

@lawrencepj1 - if I'm interpreting the data correctly, I think we still need another change to the raw datasets so that areas are specified as percent of the total gridcell area rather than percent of the land area (for consistency with our other landunit datasets). I looked at the PCT fields in mksrf_landuse_ctsm52_nrhistLUH2_1850.c220918.nc and noticed that the sum of (pct_natveg + pct_crop + pct_glacier + pct_lake + pct_urban + pct_wetland) is essentially 100% in all grid cells inside the landmask. What I would expect is that this sum is instead equal to landfrac everywhere.

This is connected to the comment you made a few months ago (https://github.com/ESCOMP/CTSM/issues/1556#issuecomment-1227491073 ), and I apologize that I never really addressed that comment directly. Indeed, I think that fixing this means either adding a new scaling of all landunit areas by landfrac, or else removing some existing scaling by landfrac that forces the total areas to 100% in all grid cells.

billsacks commented 1 year ago

@lawrencepj1 (also @slevisconsulting and @ekluzek ) - In looking into some differences arising from changing mksurfdata to use LANDFRAC rather than the mask from the vegtype dataset, I noticed what seems to be another issue with this dataset that I feel should be fixed (unless you can explain why this is correct): From looking at the file /glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918/mksrf_landuse_ctsm52_nrhistLUH2_1850.c220918.nc, there are 2483 grid cells with LANDFRAC exactly 0 but LANDMASK = 1. Some of these grid cells with LANDFRAC=0 have PCT_NATVEG and/or PCT_CROP > 0. This seems inconsistent and possibly problematic for the correctness of the resulting surface datasets: it seems to me that any point with LANDMASK = 1 or with PCT_NATVEG or PCT_CROP > 0 should have LANDFRAC > 0 – and, conversely, any point with LANDFRAC = 0 should have LANDMASK = PCT_NATVEG = PCT_CROP = 0 as well.

I have been assuming that the mesh file used with this dataset has a mask that is consistent with the LANDMASK on the PFT raw data files (there may be some other problems of inconsistency if that's not the case). If so, then any adjustment to LANDMASK on the raw datasets would also require a new mesh file.

lawrencepj1 commented 1 year ago

@billsacks Yes you are right this is an error. Looking at what you have here and then the code this is a truncation of very low LANDFRAC values in generating the raw data. I have added a check that if the LANDFRAC is truncated then the remaining PFT and CFT values are all reset to 0. I ran the new code for 1850 in the file:

/glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918/mksrf_landuse_ctsm52_nrhistLUH2_1850.newcheck.c220918.nc

with the changes between the original and newcheck files in the directory as: change1850.nc

If you would like I can recreate all of the files in the directory with the new check included.

Thanks for catching this Peter

lawrencepj1 commented 1 year ago

@ekluzek and @slevisconsulting sorry I should have included you on the above comment as well

billsacks commented 1 year ago

Thanks a lot @lawrencepj1 !. I checked some things in this new file and I agree that this solves the issue. Thank you!

Before remaking all new files: In addition to this change, can you also make the change in https://github.com/ESCOMP/CTSM/issues/1556#issuecomment-1304320939 - or let me know if I'm misunderstanding something with that?

Then, yes, I think it would be good to remake all the files with this change and we should point to the new files by default.

In addition, I think we'll need to recreate the relevant mesh file so that its mask agrees with the new mask.

lawrencepj1 commented 1 year ago

Thanks @billsacks I did update the code to do the scaling of the land units on the raw data to be the percentage of the grid cell with the new file in the directory as:

/glade/p/cesm/sdwg_dev/thesis/data/cesm_tools/surfacedata/ctsm52finalrawdata/globalctsm52nrhistLUH2deg025_220918/mksrf_landuse_ctsm52_nrhistLUH2_1850.newfrac.c220918.nc

Differences between newcheck and newfrac are in the directory in the file change1850frac.nc

There are some 10e-6 changes where landfrac == 1.0 which is the result of doing all the truncations as double precision.

You are also right that we will need a new mesh file so that the mask agrees. I think we will also need to have the LAI and soilcolor files on the new mask so they will need to be recreated.

The final issues I was going to raise is that the LUH2 data has time varying urban extent. Previously we had static urban in CLM5 so this was kept as constant as current day on the raw data regardless of the year being generated. Now that we have transient urban in CTSM52 it would make sense to explicitly include the LUH2 urban values in the crop and pft calculations even if the LUH2 urban values aren't being used in the CTSM52 surface data. Using the transient LUH2 means that as we remove urban going back in time we can explicitly provide the crop and pft values that would have been there prior to the urban expansion. It also means that we can prescribe which crop and pft values would be replaced by potential future urban under an SSP rather than letting mksurfdata average them.

I have code that uses the transient LUH2 urban extent already working and the raw data produced works with mksurfdata, having minimal impact on the historical period. I would suggest that this new element would be good to include in the CTSM52 raw data going forward. I would be interested to hear what you thought about this idea. I passed it by @dlawrenncar a while ago and he seemed okay with the idea however we didn't explore it in too much detail.

Cheers Peter

billsacks commented 1 year ago

@lawrencepj1 awesome thanks a lot for making this change! I did a quick check myself and it looks like I would expect.

Regarding accounting for transient urban: Yes, I think this makes sense to me, too. Thanks a lot for thinking of it!