Open glemieux opened 1 year ago
Review of dynHarvestMod.F90, which is used as a template for the dynLandUseMod.F90 development.
Notable differences between elm and clm:
dynHarvest_interp
and dynHarvest_interp_resolve_harvesttype
are the CN and FATES interpolation procedures, respectively. dynHarvest_interp_resolve_harvesttype
is called directly in clmfates_interfaceMod.F90. The data is subsequently passed to fates later down the dynamics_driver
call via a typical bc_in
assignment.dynHarvest_interp_harvest_type
is the only interpolate procedure in elm. It is called in dynSubgrid_driver
and is called every time step (regardless of FATES use or not).bc_in
variables in dynamics_driv
in the same manner as clm.harvest_rates
dimensionality is flipped between the two models. In elm it's harvest_rates(varnum,bounds%begg:bounds%endg
, while in clm, it's the opposite.year_position
flag is flipped between the two models: dynHarvest_file = dyn_file_type(harvest_filename, YEAR_POSITION_START_OF_TIMESTEP)
for elm, clm uses END_OF_TIMESTEP
get_do_harvest
seems really complicated. Why is it setup this way? I can ask Bill Sacks for a breakdown as I think he developed this. That said, I think we don't need to mimic this for use_fates_luh
, but it might be good to double check.Development branch: https://github.com/glemieux/E3SM/tree/lnd/fates-luh2
Associated fates branches:
@ckoven should use of luh2 data be incompatible with using transient data?
@ckoven should use of luh2 data be incompatible with using transient data?
yes, anything FATES-related should be incompatible with the big-leaf transient land use drivers
ELM build development notes:
use_cases
for the different scenariossetup_logic_surface_dataset
to make sure a compatible dataset is called to be used with the luh data.setup_logic_initial_conditions
to check the initial date?Updated:
ELMBuildNameList notes:
setup_logic_do_transient_pfts
and setup_logic_do_transient_crops
@ckoven did I understand correctly that hlm_use_lu_harvest
and hlm_use_luh
are independent and thus commit https://github.com/NGEET/fates/commit/40993458eaf65bff2b9bf437b69da9ecc4b7e0a1 makes sense?
As a followup, iirc in the meeting today, we discussed that using harvest and luh2 at the same time is a future goal, correct? I started to add logic to deny this combo on the elm-side during namelist build check, but I'm wondering if I should let both be set?
@glemieux no these should both be allowed already. The future goal will be to remove the current reading of the logging rates from the surface dataset entirely and replace them with the same information that is now on the luh2 file.
elmfates_interfacemod.F90 notes:
Aside: should the module actually be located in the dyn_subgrid
directory? It seems conceptually consistent, but I'm not sure if software structure-wise it makes sense.
Aside: should the module actually be located in the
dyn_subgrid
directory? It seems conceptually consistent, but I'm not sure if software structure-wise it makes sense.
I guess another possibility is to put the subroutine not in its own module but in the (ec)lmfates_interfaceMod.F90 module?
LUH2 file IO notes:
To do:
I guess another possibility is to put the subroutine not in its own module but in the (ec)lmfates_interfaceMod.F90 module?
Definitely a possibility. Right now I'm looking to see if I can implement something simple for using controlMod.F90 for the file IO, instead of the dynSubGrid module setup. I've also got a short, tentative walk through of dynSubGrid with Bill (and Erik) since he's the one who did most of the development for it, I believe.
Build testing is in progress. I've messed up something with the namelist system:
ERROR: Command: '/home/glemieux/Repos/E3SM-Project/e3sm/components/elm/bld/build-namelist -infile /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf/namelist -csmdata /data/cesmdataroot/inputdata -inputdata /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elm.input_data_list -ignore_ic_year -namelist " &elm_inparm start_ymd=00010101 /" -use_case 2000_control -res bci_0.1x0.1_v4.0i -clm_usr_name bci_0.1x0.1_v4.0i -clm_start_type default -envxml_dir /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458 -l_ncpl 48 -r_ncpl 48 -lnd_frac /data/sitedata/bci_0.1x0.1_v4.0i/domain_bci_clm5.0.dev009_c180523.nc -glc_nec 0 -co2_ppmv 367.0 -co2_type constant -ncpl_base_period day -config /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf/config_cache.xml -bgc fates -no-megan -no-drydep'
failed with error 'Can't call method "ww" without a package or object reference at /home/glemieux/Repos/E3SM-Project/e3sm/components/elm/bld/ELMBuildNamelist.pm line 2655.' from dir '/home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf'
This is failing in setup_logic_do_harvest
in ELMBuildNameList.pm which is interesting since I didn't set this in the namelist or require it during the init.
A sanity check build against the most recent elm-fates api update builds fine.
This is failing in
setup_logic_do_harvest
in ELMBuildNameList.pm which is interesting since I didn't set this in the namelist or require it during the init.
This ended up being a simple issue of having some random characters (the ww
referenced above) in this subroutine that just made no sense. It's getting past the ELM namelist build successfully now and crashing due to fates-side needs. Working through those now. I should note this is building just without actually setting the use_fates_luh
flag to true, so its breaking the default run mode.
Just to say that I pushed commits on both the ELM and FATES branches that include some new history dimensions and one new history variable
@ckoven can you comment on this change that came in with commit https://github.com/NGEET/fates/commit/8c942e8d3b2d4b9631ac28a66f6fcaafdeeedca8? It looks like you are adding a new site level variable, but I wasn't sure if that was actually the intent here.
Thanks @glemieux for catching. Yeah, sorry, I didn't follow that thread to completion. The intent is to get rid of the three variables currentSite%disturbance_rates_primary_to_primary(i_disturbance_type)
, currentSite%disturbance_rates_secondary_to_secondary(i_disturbance_type)
, and currentSite%disturbance_rates_primary_to_secondary(i_disturbance_type)
, and replace them with a new site-level variables that tracked disturbance rates with three dimensions: N_DIST_TYPES
n_landuse_cats
n_landuse_cats
. The first n_landuse_cats
would be for the donor patch label, and the second would be for the receiver patch label.
Then, in the history output, this part of the code https://github.com/NGEET/fates/blob/main/main/FatesHistoryInterfaceMod.F90#L2615-L2635 would need to be reworked to replace all the p2p, s2s, and p2s stuff by a single variable with the new lulu
dimension that I added in 893615c.
Does that make sense?
Yep, that makes sense.
Reviewing that loop I got to thinking that we could potentially add some logic to it to avoid accumulating the disturbance rate for primaryland
if the donor type is anything other than primaryland
. E.g. we will never have a secondaryland
donating to primaryland
correct? Theoretically the way it is should be fine since said disturbance rate during those loop iterations would be zero anyways.
Right, there could be logic but it shouldn't be needed. It might make sense to put in a failure if the donor loop index is not primary, the receiver loop index is primary, and the area disturbed in that part of the loop sequence is nonzero.
Hi @glemieux just to say that I was looking at the code and saw a bug that needed fixing, so just pushed as 713aa876.
We've got a set of code now with e25e4a8319cd344b17d5b1a6d427208a0dc881c7 and https://github.com/NGEET/fates/commit/fb7b4eb965cbfbf6d277eeb31bb37228db30d553 that will build in fates default mode.
OK, I think a couple things need to be done still before turning it on. First is that the current code will I think apply an annual change flux every day. So we need to either divide by 365 or do it only once a year (or put in a switch to allow the user to decide which of those to do). Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?
OK, I think a couple things need to be done still before turning it on. First is that the current code will I think apply an annual change flux every day. So we need to either divide by 365 or do it only once a year (or put in a switch to allow the user to decide which of those to do).
This came up as an aside in the ctsm meeting today when we were talking the clm-side BGC code updates Ryan was working on. Erik was explaining the ctsm "dribbler" concept for avoiding step function-like changes in the rates.
Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?
I can give this a crack tonight or tomorrow morning, unless you get to it first.
Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?
I can give this a crack tonight or tomorrow morning, unless you get to it first.
I just added that change and pushed it to https://github.com/ckoven/fates/commit/6c8f560b88199b152502c876a04d283d54c5cdb3
OK I made a couple other changes just now to (hopefully) help the readability of the looping logic a bit just now. Will stop messing with the code now until you can start testing it.
@ckoven per our conversation, I tried generating some write statements to get the indices of the loop in spawn_patches
prior to where the call to TransLitterNewPatch
is failing. It looks like the runtime is segfaulting before it can even write the statements. The land log doesn't even have the expected FATES dynamics start
output (i.e. the last entry is Beginning timestep : 2000-01-01_00:30:00
). This happens regardless of being in hlm debug mode or not.
hmm, that's not good! are you running at a single site or larger scale?
Single site (BCI). Do you think running a global case might help?
no that should be the simplest configuration
I wonder if it is because the arrays in aren't being allocated in dynFatesLandUseInit when use_luh isn't true?
Good thought. I'll look into this.
@ckoven per our conversation, I tried generating some write statements to get the indices of the loop in
spawn_patches
prior to where the call toTransLitterNewPatch
is failing. It looks like the runtime is segfaulting before it can even write the statements. The land log doesn't even have the expectedFATES dynamics start
output (i.e. the last entry isBeginning timestep : 2000-01-01_00:30:00
). This happens regardless of being in hlm debug mode or not.
This issue I was seeing was a little misleading. I forgot I was to remove the temp code to force disturbance to be zero for a different issue (NaN
due to divide by zero). Removing that I ran into a similar issue though, which was easier to identify. Part of the spawn_patches
routine was missed in renaming the old new_patch_primiary
: https://github.com/NGEET/fates/commit/6a935ad816774e5bb765a0af0d82358db33c93fc
I'm hitting an FPE issue now in the first fates history write, but I suspect that should be a straightforward fix.
@ckoven Ok, the default mode is working now with commit https://github.com/NGEET/fates/commit/b19cedf7896f0fe738380b46c33bb31ea74b167f. An important note: commenting out the write blocks in the patchloop_areadis
loop that print out the indices, labels, and disturbance rate causes a FPE with the line that does the comparison against reasonable math precision:
disturbance_rate > (1.0_r8 + rsnbl_math_prec)
I think I have an idea for why this is and I'm testing it now
ok great, thanks. On the site where you are testing. Does the initial state vector sum to one? I suspect that once you turn on the land use change code it should crash if that is not the case.
ok great, thanks. On the site where you are testing. Does the initial state vector sum to one? I suspect that once you turn on the land use change code it should crash if that is not the case.
Thanks for the reminder, I haven't checked that.
I'm running into some issues with the elm pio that I need to look into to get a better understanding of:
ERROR: The size of the user buffer ( 0 elements) is smaller than the size of the variable ( 1166 elements)
pio_support::pio_die:: myrank= -1 : ERROR: pionfget_mod.F90: 509 : Insufficient user buffer size
As a minor note, I didn't realize that the pio routines are fussy about variable case.
UPDATE: This is likely due to me not running luh mode for the appropriate resolution.
Running on summit
is getting farther along, but hitting an issue with the dimensionality of the data:
1: 1: Opened existing file /ccs/home/glemieux/parameter_files/LUH2_historical_1850_2015_4x5_220308_year.nc 47
1: 1: Assertion failed...
1: 1: PIO: FATAL ERROR: Aborting... Extra outermost dimensions must have lengths of 1 (/autofs/nccs-svm1_home1/glemieux/e3sm/externals/scorpio/src/clib/pio_darray_int.c: 1358)
1: 1: Obtained 10 stack frames.
Note that I need to change the time
variable to YEAR
due to dynSubGrid file import requirements.
HI @glemieux just curious if there is any update or if it would be helpful to discuss anything today, thanks-
@ckoven I think this has more to do with elm's expectation of file "structure" at a low-level, for lack of a better term. I.e. I think we need make the luh2 file more closely resemble whatever requirement are being set. I haven't had much time to dig into it this morning, but will be able to give this time this afternoon.
ok thanks. do you know if the issue is the metadata (variable names and dimensions, etc) or something lower level (netcdf version, variable type, etc)?
From looking at the code, i think this is metadata (dimensions specifically).
From looking at the code, i think this is metadata (dimensions specifically).
Looking at the differences between PIO and SCORPIO, this assertion check is specific to SCORPIO. Looking at the PIO code in ctsm, I don't this we'd have an issue with the file metadata as laid out currently:
SCORPIO pioassertion
in pio_read_darray_nc_serial
versus the most similar pioassertion
in the same procedure in PIO.
The issue is cropping up upon trying to read in the first record variable (primf_to_secdn
).
The relevant commit that where this check was added in SCORPIO is https://github.com/E3SM-Project/scorpio/commit/d0163d2fea76eeb7e30f57cbd60499a19f07f432, which was part of https://github.com/E3SM-Project/scorpio/pull/113/. I'm going to reach out to the author to get some guidance.
The above issue has been resolved. This was due to me renaming time
to YEAR
. Reviewing other elm landuse timeseries data I found that what I should have done was simply added YEAR
as a copy of time
. Erik noted that the YEAR
variable requirement is pretty old and likely outdated. But at least its consistent across ELM and CLM.
I'm now running into the state_vector
sum to one check warnings and hitting an issue in the EDInitMod during init_patches
having area issues, so I think we're past the read-in issues and moving on to passing data issues.
I'm now running into the
state_vector
sum to one check warnings and hitting an issue in the EDInitMod duringinit_patches
having area issues, so I think we're past the read-in issues and moving on to passing data issues.
@ckoven the issue here appears to be due in part to missing calls to the interpolation prior to the patch initialization on the fates-side, which I was missing, so it was getting junk data for the states and rates (names were fine though).
The other issue has to do with the loop in the landuse mod in fates here: https://github.com/ckoven/fates/blob/44696201f9dd9a3add7889f805d39ef82b7205ff/biogeochem/FatesLandUseChangeMod.F90#L241-L249.
The hlm_num_luh2_states
value is 12
, so this will cause an issue when using it to define the boundaries of the index into luh2_fates_luype_map
which is a 5x5 matrix. I have an idea of how to refactor this code with a different mapping scheme that I'm going to try tomorrow, unless you get to it first.
ok, great. yep that was just a mistake in my logic, sorry. feel free to do the mapping a different way, thanks.
@ckoven, so I've fixed the initialization order of operations on the elm side to make sure that the init_patches
has the state data it needs. That said, writing out the luh2 states by gridcell, I'm seeing no states summing to unity (although many are close) as well as a number of gridcells where each state type in that gridcell is NaN
. I did a spot check on the netcdf file that its inputting to make sure the values match.
My assumption here is that our regridding may be messing up the fractions such that they don't sum to one within the new grid cell?
I think the first thing to do is divide by the regridded land area, and see if that fixes the summing-to-one problem. The NaNs is a tougher problem, I wonder what the lats and lons are of those gridcells? presumably it is because they are for land that ELM is running over but has no LUH2 data because ice or similar.
re the NaN issue: e.g. I am noticing that luh2 (unsurprisingly) has no data for Antarctica, whereas ELM also runs at the perimeter of the continent. So I think the strategy should probably be to search for NaNs, and when found, set the primf (or maybe primn) state to be one, and all other states and all transitions to be zero.
re the NaN issue: e.g. I am noticing that luh2 (unsurprisingly) has no data for Antarctica, whereas ELM also runs at the perimeter of the continent. So I think the strategy should probably be to search for NaNs, and when found, set the primf (or maybe primn) state to be one, and all other states and all transitions to be zero.
Roger that. Will do
I think the first thing to do is divide by the regridded land area, and see if that fixes the summing-to-one problem.
So basically, rerun the regridder but use conservative_normed
?
Per @ckoven, use the dynHarvestMod.F90 as a guideline for this module.
Tasks