Limit where dynamic crops and transient PFTs are allowed

billsacks commented 6 years ago

(From NCAR Teamwork site, 2015-11-11)

I found one area with a big potential for performance gains: Limiting where we allow dynamic crops. I did some performance runs with (a) CLM5BGC, (b) CLM5BGCCROP, and (c) CLM5BGCCROP, but only allocating memory for non-zero-weight crop columns (according to either year-1850 or year-2000 crop distributions... the difference between those two, in terms of performance, was much less than I expected).

When we include crops, the impact of adding 0-weight columns is much greater than the ~ 10% I found when I was previously investigating the impact of dynamic landunits on performance: When running with crops, the impact of these 0-weight columns is a 1.59x increase in cost. I think that part of the reason why I didn't consider this before was that these 0-weight columns had always been added, even before I started my dynamic landunits work.

Currently, the impact of including crops is a 2.31x cost increase. This could be reduced to as low as ~ 1.5x if we only allocated memory where needed, according to the crop distribution in year-2000. In practice, we would want to do something more than that, to allow for other changes with future scenarios. And things get harder if we want to allow coupling to an integrated assessment model, which could change flanduse_timeseries in unpredictable ways. But we could still achieve a significant cost savings if we restrict where dynamic crops are possible.

Note that this big cost savings comes from restricting the types of crops that can grow in each grid cell, in addition to the presence of any crops at all. If, instead, we say: "anywhere crops can grow, allocate memory for all possible crop types", then the savings are less (at best we reduce the 2.31x to 2.0x).

So how could we do this?

(1) The easiest thing from a software perspective would be to have a field on the surface dataset (or elsewhere) saying which crops are allowed to exist in each grid cell, ever. If someone was able to come up with such a dataset, the code mods needed to make use of this new field would be easy – I've pretty much already done it for the sake of these timing runs. The main downside of this approach is needing to come up with this field, particularly if we want to allow coupling to an integrated assessment model that may change flanduse_timeseries in unpredictable ways.

(2) An alternative would be to give more consideration to the idea of resizing arrays in memory as needed. It's possible that we could achieve this through something like init_interp; however, it feels like that would still take significant code rework, to allow reinitialization of the model at runtime. So I would not consider this a viable solution on the CESM2 timeframe.

billsacks commented 6 years ago

(From NCAR Teamwork site 2015-11-11)

One other thought: We should keep an eye out for code – particularly crop code – that operates over all points, rather than just active points. Loops that operates over filters are safe (because filters just operate over active points), but anything like: do c = bounds%begc, bounds%endc

is suspect

as is anything like:

foo(bounds%begc:bounds%endc) = ...

Update (2021-02-03): Let's not worry about this piece for this issue.

billsacks commented 6 years ago

A while ago, I implemented the limit on where crops can be grown for non-transient runs. But this hasn't been implemented for transient runs. See this code in subgridMod.F90:

       if (get_do_transient_crops()) then
          ! To support dynamic landunits, we have all possible crop columns in every grid
          ! cell, because they might need to come into existence even if their weight is 0 at
          ! the start of the run.
          if (pftcon%is_pft_known_to_model(cft)) then
             exists = .true.
          else
             exists = .false.
          end if

       else
          ! For a run without transient crops, only allocate memory for crops that are
          ! actually present in this run. (This will require running init_interp when
          ! changing between a transient crop run and a non-transient run.)
          if (wt_lunit(gi, istcrop) > 0.0_r8 .and. wt_cft(gi, cft) > 0.0_r8) then
             exists = .true.
          else
             exists = .false.
          end if
       end if

Also, I have never addressed https://github.com/ESCOMP/ctsm/issues/229#issuecomment-361305618

billsacks commented 6 years ago

I'm planning to put in place code for natural PFTs to only have memory allocated where they actually exist, in non-transient runs (#298 ). For transient runs, we'll need a feature like the one discussed here for crops; thus I have changed the title of this issue.

@dlawrenncar pointed out that it won't work to have the "potential area" on the surface dataset, because this differs for different historical simulations. But it could work to have this as a time-constant field on the landuse.timeseries file.

@lawrencepj1 bringing you into the loop here, too.

billsacks commented 3 years ago

A few updated thoughts on this:

Something similar to this is now done for lakes; we can follow this example, but using PCT_NAT_PFT_MAX and PCT_CROP_MAX and PCT_CFT_MAX on the landuse.timeseries file:

https://github.com/ESCOMP/CTSM/blob/0f6985da747a9e0538df63bf6e3f0ff6b4341229/src/main/surfrdMod.F90#L955-L1007

https://github.com/ESCOMP/CTSM/blob/0f6985da747a9e0538df63bf6e3f0ff6b4341229/src/main/subgridMod.F90#L581-L589

There may be similar changes needed in initGridcellsMod?
I'm not sure if the necessary fields are on all out-of-the-box landuse.timeseries files, or if we're still pointing to some older files that don't have those fields
This change will lead to incompatibilities with existing initial conditions files: they will need to be run through init_interp. This may or may not be an issue for out-of-the-box initial conditions files (we may already do init_interp when starting a transient run with any of them), but should at least be noted as a caveat for users.
To confirm this was done right, we can check the number of columns and patches on a restart file on master vs. after making this change. If this was done right, then there should be many fewer columns and patches on a restart file (or a vector history file) in a transient run, with or without crops.
This should be bit-for-bit in gridcell averages, but dimension sizes will change for vector history files, so baseline comparisons may fail for h1 files in tests

slevis-lmwg commented 3 years ago

@billsacks I started looking at this today. This far, one question has come up:

As you suggested, I'm creating subroutine surfrd_max_wt_pft_cft in the template of subroutine surfrd_lakemask. The goal is to read PCT_NAT_PFT and PCT_CFT and determine their max values in the timeseries.

My question is about reading these as 4D variables. The closest existing CTSM approaches that I've found are dyncrop_interp and dynpft_interp that obtain cft and pft weights at the "current" time. Alternatively, I see ncd_io_{DIMS}d_{TYPE}_glob but I'm pretty sure this handles up to 3 dimensions. My preference would be to read the two variables as 4D, so would you recommend that I update ncd_io_{DIMS}d_{TYPE}_glob to handle 4D variables?

All suggestions are welcome. Thanks!

billsacks commented 3 years ago

There should already be variables with MAX in their names on the landuse.timeseries files – at least on some of the landuse.timeseries files. If you find some without that field, we'll need to regenerate them with the recent mksurfdata_map code – but this may change answers so we may need to do that as an initial step.

slevis-lmwg commented 3 years ago

Thank you for clarifying. Makes sense and sorry for missing that point in your earlier notes.

slevis-lmwg commented 3 years ago

Problems encountered this far, that I need to address: 1) "Hist" tests fail because PCT_NAT_PFT_MAX and PCT_CFT_MAX are not present in the landuse.timeseries files, as @billsacks cautioned in his last post. 2) This landuse.timeseries file contains all zeros for PCT_NAT_PFT_MAX and PCT_CFT_MAX: /glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/landuse.timeseries_10x15_SSP2-4.5_78pfts_CMIP6_simyr1850-2100_c190228.nc ...and I get a couple of the following errors: check_weights ERROR: at g = 3 total PFT weight is 0.99686936341033627 active_only = F in test SMS_Ld5.f10_f10_musgs.ISSP245Clm50BgcCrop.cheyenne_gnu.clm-ciso_dec2050Start 3) This landuse.timeseries file does NOT contain all zeros for PCT_NAT_PFT_MAX and PCT_CFT_MAX: /glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/landuse.timeseries_10x15_SSP3-7_78pfts_CMIP6_simyr1850-2100_c190228.nc ...and I still get the same type of error as in (2). The test is SMS_Ld5.f10_f10_musgs.ISSP370Clm50BgcCrop.cheyenne_gnu.clm-ciso_dec2050Start

I will try to resolve (3) first.

billsacks commented 3 years ago

Regarding the missing or seemingly incorrect _MAX fields: I believe @lawrencepj1 and/or @ekluzek created those so they may be able to comment on this . @ekluzek could also possibly comment on whether it would be reasonable to regenerate the landuse.timeseries files that do not have this field on them or if that would be problematic for some reason.

Regarding the check_weights error: The first thing that comes to mind is: I wonder if, in natveg_patch_exists, you need to check wt_nat_patch(gi, pft) > 0.0_r8 (which is the condition used in a non-transient run) in addition to the relevant _MAX field for a transient run. I'm not sure if that's the explanation, though. We could find a time to talk about this if you can' see the problem.

lawrencepj1 commented 3 years ago

Hi Bill, Sam and Erik

@billsacks @slevisconsulting @ekluzek

Yes these fields were put in place in mksurfdata_map to be automatically calculated when generating the landuse timeseries files. This was done as explained above to minimize the memory and processing required for transient landuse. These fields are calculated from a new subroutine update_max_array in mkpctPftTypeMod.F90 that maintains the maximum values found for an array element for the entire timeseries.

Peter

subroutine update_max_array(pct_pft_max_arr,pct_pft_arr) ! ! !DESCRIPTION: ! Given an array of pct_pft_type variables, update all the max_p2l variables. ! ! Assumes that all elements of pct_pft_max_arr and pct_pft_arr have the same ! size and lower bound for their pct_p2l array. ! ! !ARGUMENTS: ! workaround for gfortran bug (58043): declare this 'type' rather than 'class': type(pct_pft_type), intent(inout) :: pct_pft_max_arr(:) type(pct_pft_type), intent(in) :: pct_pft_arr(:) ! ! !LOCAL VARIABLES: integer :: pft_lbound integer :: pft_ubound integer :: arr_index integer :: pft_index

character(len=*), parameter :: subname = 'update_max_array' !-----------------------------------------------------------------------

pft_lbound = lbound(pct_pft_arr(1)%pct_p2l, 1) pft_ubound = ubound(pct_pft_arr(1)%pct_p2l, 1)

do arr_index = 1, size(pct_pft_arr) if (lbound(pct_pft_arr(arr_index)%pct_p2l, 1) /= pft_lbound .or. & ubound(pct_pft_arr(arr_index)%pct_p2l, 1) /= pft_ubound) then write(6,) subname//' ERROR: all elements of pct_pft_arr must have' write(6,) 'the same size and lower bound for their pct_p2l array' call abort() end if

if (pct_pft_arr(arr_index)%pct_l2g > pct_pft_max_arr(arr_index)%pct_l2g) then pct_pft_max_arr(arr_index)%pct_l2g = pct_pft_arr(arr_index)%pct_l2g end if

do pft_index = pft_lbound, pft_ubound if (pct_pft_arr(arr_index)%pct_p2l(pft_index) > pct_pft_max_arr(arr_index)%pct_p2l(pft_index)) then pct_pft_max_arr(arr_index)%pct_p2l(pft_index) = pct_pft_arr(arr_index)%pct_p2l(pft_index) end if end do end do end subroutine update_max_array

slevis-lmwg commented 3 years ago

For now I suspect no issues with the subroutine posted above by @lawrencepj1

In debugging issue (3) in my list above, I found that the test (SMS_Ld5.f10_f10_musgs.ISSP370Clm50BgcCrop.cheyenne_gnu.clm-ciso_dec2050Start) passes if I work only with PCT_NAT_PFT_MAX and not with PCT_CFT_MAX. I will pursue my first suspicion, which is that I end up dropping the weights of cfts "unknown" to the model when I use PCT_CFT_MAX.

billsacks commented 3 years ago

Ah, interesting. Maybe this is what you were already thinking, but your last comment made me realize that you probably need to call the same "collapse" code on PCT_CFT_MAX as is called for pct_cft itself.

slevis-lmwg commented 3 years ago

Status of this issue before l completely lose track of it... My code mods: /glade/work/slevis/git/lim_trans_pfts_cfts git branch (no commits, no PR, yet): lim_trans_pfts_cfts Most recently updated file in this branch: 2021/2/11 My notes outside of github are here.

billsacks commented 2 years ago

See https://github.com/ESCOMP/CTSM/pull/1661 for something similar now done for urban.

ekluzek commented 2 years ago

Since we are making new surface datasets with ctsm5.2 we should evaluate if doing this should be part of that.

billsacks commented 2 years ago

Since we are making new surface datasets with ctsm5.2 we should evaluate if doing this should be part of that.

My recollection is that the necessary fields are already on the landuse timeseries files, and that the remaining needs for this issue are in the runtime CTSM code.

billsacks commented 9 months ago

Adding next label to discuss how to prioritize this and who should do it. (I am unassigning myself.)

samsrabin commented 5 months ago

My thought is that this should be back-burnered until after the CESM3 release. It sounds like a subsequent point release might be focused on high resolution, which might be a good time to focus on performance-related issues.

samsrabin commented 4 months ago

This would necessarily be limiting of the kinds of restarts that could be done from a given run, so maybe this would be something that is off by default. Then people who really know what they're doing and understand the tradeoffs would turn this on to save costs. We'd certainly do it for our tests, I'd think.

ESCOMP / CTSM

Limit where dynamic crops and transient PFTs are allowed #229