glemieux / E3SM

Energy Exascale Earth System Model source code.
https://e3sm.org/
Other
0 stars 0 forks source link

Create ELM module to handle LUH2 data import #21

Open glemieux opened 1 year ago

glemieux commented 1 year ago

Per @ckoven, use the dynHarvestMod.F90 as a guideline for this module.

Tasks

glemieux commented 1 year ago

Review of dynHarvestMod.F90, which is used as a template for the dynLandUseMod.F90 development.

Notable differences between elm and clm:

glemieux commented 1 year ago

Development branch: https://github.com/glemieux/E3SM/tree/lnd/fates-luh2

Associated fates branches:

glemieux commented 1 year ago

@ckoven should use of luh2 data be incompatible with using transient data?

ckoven commented 1 year ago

@ckoven should use of luh2 data be incompatible with using transient data?

yes, anything FATES-related should be incompatible with the big-leaf transient land use drivers

glemieux commented 1 year ago

ELM build development notes:

Updated:

ELMBuildNameList notes:

glemieux commented 1 year ago

@ckoven did I understand correctly that hlm_use_lu_harvest and hlm_use_luh are independent and thus commit https://github.com/NGEET/fates/commit/40993458eaf65bff2b9bf437b69da9ecc4b7e0a1 makes sense?

As a followup, iirc in the meeting today, we discussed that using harvest and luh2 at the same time is a future goal, correct? I started to add logic to deny this combo on the elm-side during namelist build check, but I'm wondering if I should let both be set?

ckoven commented 1 year ago

@glemieux no these should both be allowed already. The future goal will be to remove the current reading of the logging rates from the surface dataset entirely and replace them with the same information that is now on the luh2 file.

glemieux commented 1 year ago

elmfates_interfacemod.F90 notes:

glemieux commented 1 year ago

Aside: should the module actually be located in the dyn_subgrid directory? It seems conceptually consistent, but I'm not sure if software structure-wise it makes sense.

ckoven commented 1 year ago

Aside: should the module actually be located in the dyn_subgrid directory? It seems conceptually consistent, but I'm not sure if software structure-wise it makes sense.

I guess another possibility is to put the subroutine not in its own module but in the (ec)lmfates_interfaceMod.F90 module?

glemieux commented 1 year ago

LUH2 file IO notes:

To do:

glemieux commented 1 year ago

I guess another possibility is to put the subroutine not in its own module but in the (ec)lmfates_interfaceMod.F90 module?

Definitely a possibility. Right now I'm looking to see if I can implement something simple for using controlMod.F90 for the file IO, instead of the dynSubGrid module setup. I've also got a short, tentative walk through of dynSubGrid with Bill (and Erik) since he's the one who did most of the development for it, I believe.

glemieux commented 1 year ago

Build testing is in progress. I've messed up something with the namelist system:

ERROR: Command: '/home/glemieux/Repos/E3SM-Project/e3sm/components/elm/bld/build-namelist -infile /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf/namelist  -csmdata /data/cesmdataroot/inputdata -inputdata /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elm.input_data_list -ignore_ic_year -namelist " &elm_inparm  start_ymd=00010101  /" -use_case 2000_control  -res bci_0.1x0.1_v4.0i  -clm_usr_name bci_0.1x0.1_v4.0i -clm_start_type default -envxml_dir /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458 -l_ncpl 48 -r_ncpl 48 -lnd_frac /data/sitedata/bci_0.1x0.1_v4.0i/domain_bci_clm5.0.dev009_c180523.nc -glc_nec 0 -co2_ppmv 367.0 -co2_type constant  -ncpl_base_period day  -config /home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf/config_cache.xml -bgc fates -no-megan -no-drydep' 
failed with error 'Can't call method "ww" without a package or object reference at /home/glemieux/Repos/E3SM-Project/e3sm/components/elm/bld/ELMBuildNamelist.pm line 2655.' from dir '/home/glemieux/scratch/e3sm-cases/luh2-buildcheck.fates.lobata.E6c1717c87c-F40993458/Buildconf/elmconf'

This is failing in setup_logic_do_harvest in ELMBuildNameList.pm which is interesting since I didn't set this in the namelist or require it during the init.

A sanity check build against the most recent elm-fates api update builds fine.

glemieux commented 1 year ago

This is failing in setup_logic_do_harvest in ELMBuildNameList.pm which is interesting since I didn't set this in the namelist or require it during the init.

This ended up being a simple issue of having some random characters (the ww referenced above) in this subroutine that just made no sense. It's getting past the ELM namelist build successfully now and crashing due to fates-side needs. Working through those now. I should note this is building just without actually setting the use_fates_luh flag to true, so its breaking the default run mode.

ckoven commented 1 year ago

Just to say that I pushed commits on both the ELM and FATES branches that include some new history dimensions and one new history variable

glemieux commented 1 year ago

@ckoven can you comment on this change that came in with commit https://github.com/NGEET/fates/commit/8c942e8d3b2d4b9631ac28a66f6fcaafdeeedca8? It looks like you are adding a new site level variable, but I wasn't sure if that was actually the intent here.

ckoven commented 1 year ago

Thanks @glemieux for catching. Yeah, sorry, I didn't follow that thread to completion. The intent is to get rid of the three variables currentSite%disturbance_rates_primary_to_primary(i_disturbance_type), currentSite%disturbance_rates_secondary_to_secondary(i_disturbance_type), and currentSite%disturbance_rates_primary_to_secondary(i_disturbance_type), and replace them with a new site-level variables that tracked disturbance rates with three dimensions: N_DIST_TYPES n_landuse_cats n_landuse_cats. The first n_landuse_cats would be for the donor patch label, and the second would be for the receiver patch label.

Then, in the history output, this part of the code https://github.com/NGEET/fates/blob/main/main/FatesHistoryInterfaceMod.F90#L2615-L2635 would need to be reworked to replace all the p2p, s2s, and p2s stuff by a single variable with the new lulu dimension that I added in 893615c.

Does that make sense?

glemieux commented 1 year ago

Yep, that makes sense.

Reviewing that loop I got to thinking that we could potentially add some logic to it to avoid accumulating the disturbance rate for primaryland if the donor type is anything other than primaryland. E.g. we will never have a secondaryland donating to primaryland correct? Theoretically the way it is should be fine since said disturbance rate during those loop iterations would be zero anyways.

ckoven commented 1 year ago

Right, there could be logic but it shouldn't be needed. It might make sense to put in a failure if the donor loop index is not primary, the receiver loop index is primary, and the area disturbed in that part of the loop sequence is nonzero.

ckoven commented 1 year ago

Hi @glemieux just to say that I was looking at the code and saw a bug that needed fixing, so just pushed as 713aa876.

glemieux commented 1 year ago

We've got a set of code now with e25e4a8319cd344b17d5b1a6d427208a0dc881c7 and https://github.com/NGEET/fates/commit/fb7b4eb965cbfbf6d277eeb31bb37228db30d553 that will build in fates default mode.

ckoven commented 1 year ago

OK, I think a couple things need to be done still before turning it on. First is that the current code will I think apply an annual change flux every day. So we need to either divide by 365 or do it only once a year (or put in a switch to allow the user to decide which of those to do). Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?

glemieux commented 1 year ago

OK, I think a couple things need to be done still before turning it on. First is that the current code will I think apply an annual change flux every day. So we need to either divide by 365 or do it only once a year (or put in a switch to allow the user to decide which of those to do).

This came up as an aside in the ctsm meeting today when we were talking the clm-side BGC code updates Ryan was working on. Erik was explaining the ctsm "dribbler" concept for avoiding step function-like changes in the rates.

Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?

I can give this a crack tonight or tomorrow morning, unless you get to it first.

ckoven commented 1 year ago

Another thing is I think we've lost some information in the site%disturbance rate variable that was feeding the history diagnostics, I can rework it slightly to give the right info unless you want to do that. Otherwise, it should be good to try I think?

I can give this a crack tonight or tomorrow morning, unless you get to it first.

I just added that change and pushed it to https://github.com/ckoven/fates/commit/6c8f560b88199b152502c876a04d283d54c5cdb3

ckoven commented 1 year ago

OK I made a couple other changes just now to (hopefully) help the readability of the looping logic a bit just now. Will stop messing with the code now until you can start testing it.

glemieux commented 1 year ago

@ckoven per our conversation, I tried generating some write statements to get the indices of the loop in spawn_patches prior to where the call to TransLitterNewPatch is failing. It looks like the runtime is segfaulting before it can even write the statements. The land log doesn't even have the expected FATES dynamics start output (i.e. the last entry is Beginning timestep : 2000-01-01_00:30:00). This happens regardless of being in hlm debug mode or not.

ckoven commented 1 year ago

hmm, that's not good! are you running at a single site or larger scale?

glemieux commented 1 year ago

Single site (BCI). Do you think running a global case might help?

ckoven commented 1 year ago

no that should be the simplest configuration

ckoven commented 1 year ago

I wonder if it is because the arrays in aren't being allocated in dynFatesLandUseInit when use_luh isn't true?

glemieux commented 1 year ago

Good thought. I'll look into this.

glemieux commented 1 year ago

@ckoven per our conversation, I tried generating some write statements to get the indices of the loop in spawn_patches prior to where the call to TransLitterNewPatch is failing. It looks like the runtime is segfaulting before it can even write the statements. The land log doesn't even have the expected FATES dynamics start output (i.e. the last entry is Beginning timestep : 2000-01-01_00:30:00). This happens regardless of being in hlm debug mode or not.

This issue I was seeing was a little misleading. I forgot I was to remove the temp code to force disturbance to be zero for a different issue (NaN due to divide by zero). Removing that I ran into a similar issue though, which was easier to identify. Part of the spawn_patches routine was missed in renaming the old new_patch_primiary: https://github.com/NGEET/fates/commit/6a935ad816774e5bb765a0af0d82358db33c93fc

I'm hitting an FPE issue now in the first fates history write, but I suspect that should be a straightforward fix.

glemieux commented 1 year ago

@ckoven Ok, the default mode is working now with commit https://github.com/NGEET/fates/commit/b19cedf7896f0fe738380b46c33bb31ea74b167f. An important note: commenting out the write blocks in the patchloop_areadis loop that print out the indices, labels, and disturbance rate causes a FPE with the line that does the comparison against reasonable math precision:

disturbance_rate > (1.0_r8 + rsnbl_math_prec)

I think I have an idea for why this is and I'm testing it now

ckoven commented 1 year ago

ok great, thanks. On the site where you are testing. Does the initial state vector sum to one? I suspect that once you turn on the land use change code it should crash if that is not the case.

glemieux commented 1 year ago

ok great, thanks. On the site where you are testing. Does the initial state vector sum to one? I suspect that once you turn on the land use change code it should crash if that is not the case.

Thanks for the reminder, I haven't checked that.

glemieux commented 1 year ago

I'm running into some issues with the elm pio that I need to look into to get a better understanding of:

 ERROR: The size of the user buffer (           0  elements) is smaller than the size of the variable (                 1166  elements)
 pio_support::pio_die:: myrank=          -1 : ERROR: pionfget_mod.F90:         509 : Insufficient user buffer size

As a minor note, I didn't realize that the pio routines are fussy about variable case.

UPDATE: This is likely due to me not running luh mode for the appropriate resolution.

glemieux commented 1 year ago

Running on summit is getting farther along, but hitting an issue with the dimensionality of the data:

1: 1:  Opened existing file /ccs/home/glemieux/parameter_files/LUH2_historical_1850_2015_4x5_220308_year.nc 47
1: 1: Assertion failed...
1: 1: PIO: FATAL ERROR: Aborting... Extra outermost dimensions must have lengths of 1 (/autofs/nccs-svm1_home1/glemieux/e3sm/externals/scorpio/src/clib/pio_darray_int.c: 1358)
1: 1: Obtained 10 stack frames.

Note that I need to change the time variable to YEAR due to dynSubGrid file import requirements.

ckoven commented 1 year ago

HI @glemieux just curious if there is any update or if it would be helpful to discuss anything today, thanks-

glemieux commented 1 year ago

@ckoven I think this has more to do with elm's expectation of file "structure" at a low-level, for lack of a better term. I.e. I think we need make the luh2 file more closely resemble whatever requirement are being set. I haven't had much time to dig into it this morning, but will be able to give this time this afternoon.

ckoven commented 1 year ago

ok thanks. do you know if the issue is the metadata (variable names and dimensions, etc) or something lower level (netcdf version, variable type, etc)?

glemieux commented 1 year ago

From looking at the code, i think this is metadata (dimensions specifically).

glemieux commented 1 year ago

From looking at the code, i think this is metadata (dimensions specifically).

Looking at the differences between PIO and SCORPIO, this assertion check is specific to SCORPIO. Looking at the PIO code in ctsm, I don't this we'd have an issue with the file metadata as laid out currently: SCORPIO pioassertion in pio_read_darray_nc_serial versus the most similar pioassertion in the same procedure in PIO.

The issue is cropping up upon trying to read in the first record variable (primf_to_secdn).

The relevant commit that where this check was added in SCORPIO is https://github.com/E3SM-Project/scorpio/commit/d0163d2fea76eeb7e30f57cbd60499a19f07f432, which was part of https://github.com/E3SM-Project/scorpio/pull/113/. I'm going to reach out to the author to get some guidance.

glemieux commented 1 year ago

The above issue has been resolved. This was due to me renaming time to YEAR. Reviewing other elm landuse timeseries data I found that what I should have done was simply added YEAR as a copy of time. Erik noted that the YEAR variable requirement is pretty old and likely outdated. But at least its consistent across ELM and CLM.

I'm now running into the state_vector sum to one check warnings and hitting an issue in the EDInitMod during init_patches having area issues, so I think we're past the read-in issues and moving on to passing data issues.

glemieux commented 1 year ago

I'm now running into the state_vector sum to one check warnings and hitting an issue in the EDInitMod during init_patches having area issues, so I think we're past the read-in issues and moving on to passing data issues.

@ckoven the issue here appears to be due in part to missing calls to the interpolation prior to the patch initialization on the fates-side, which I was missing, so it was getting junk data for the states and rates (names were fine though).

The other issue has to do with the loop in the landuse mod in fates here: https://github.com/ckoven/fates/blob/44696201f9dd9a3add7889f805d39ef82b7205ff/biogeochem/FatesLandUseChangeMod.F90#L241-L249.

The hlm_num_luh2_states value is 12, so this will cause an issue when using it to define the boundaries of the index into luh2_fates_luype_map which is a 5x5 matrix. I have an idea of how to refactor this code with a different mapping scheme that I'm going to try tomorrow, unless you get to it first.

ckoven commented 1 year ago

ok, great. yep that was just a mistake in my logic, sorry. feel free to do the mapping a different way, thanks.

glemieux commented 1 year ago

@ckoven, so I've fixed the initialization order of operations on the elm side to make sure that the init_patches has the state data it needs. That said, writing out the luh2 states by gridcell, I'm seeing no states summing to unity (although many are close) as well as a number of gridcells where each state type in that gridcell is NaN. I did a spot check on the netcdf file that its inputting to make sure the values match.

My assumption here is that our regridding may be messing up the fractions such that they don't sum to one within the new grid cell?

ckoven commented 1 year ago

I think the first thing to do is divide by the regridded land area, and see if that fixes the summing-to-one problem. The NaNs is a tougher problem, I wonder what the lats and lons are of those gridcells? presumably it is because they are for land that ELM is running over but has no LUH2 data because ice or similar.

ckoven commented 1 year ago

re the NaN issue: e.g. I am noticing that luh2 (unsurprisingly) has no data for Antarctica, whereas ELM also runs at the perimeter of the continent. So I think the strategy should probably be to search for NaNs, and when found, set the primf (or maybe primn) state to be one, and all other states and all transitions to be zero.

glemieux commented 1 year ago

re the NaN issue: e.g. I am noticing that luh2 (unsurprisingly) has no data for Antarctica, whereas ELM also runs at the perimeter of the continent. So I think the strategy should probably be to search for NaNs, and when found, set the primf (or maybe primn) state to be one, and all other states and all transitions to be zero.

Roger that. Will do

glemieux commented 1 year ago

I think the first thing to do is divide by the regridded land area, and see if that fixes the summing-to-one problem.

So basically, rerun the regridder but use conservative_normed?