Solidifying output standard

annethomas commented 7 years ago

There are several issues already discussing this goal, but hopefully this will serve as touchpoint for moving forward on establishing a more comprehensive output standard for variable names/units. Linking to issues #1487 #1442 #1415 @mdietze @mccabete @tonygardella

Description

We would like to align input/output variable names more closely throughout PEcAn. The MsTMIP table is the beginning of this but there are a lot of variables not included in this standard, and the relevant variables in BETY aren't always consistent in names and units. The proposal is to have a unified table in code and documentation, probably by renaming the mstmip_var.csv table to something more general, and to add useful variables using CF naming standards. This may involve renaming some BETY variables if possible, or else using their standard name field.

Here is a link to a table of additional variables proposed so far, to review if you'd like to give input on the decision: https://docs.google.com/spreadsheets/d/1ETasC8Nc0zGBjzo-wAY_Tyhd-IqSLKlvq854ErVypic/edit?usp=sharing

Notes

1) Many of the new variables reflect the hierarchy of carbon pools/fluxes being worked on by @mccabete. 2) In the table so far, I've focused on input variables for pool-based models and some variables immediately related (like the litter fluxes) but this is not yet exhaustive. @mdietze Let me know if you'd like Tess and/or me to work on fleshing out the whole hierarchy in this table right away. 3) For litter pools and fluxes, I populated with existing CF names and then divided into the subpools (leaf, root etc) and added corresponding pools/fluxes. I'm not sure how many of them we actually will need to define. 4) Layer-based soil variables: In Mstmip, TotSoilCarb has no layer-specific version so we want to add that (as an alternative to using CarbPools). Interestingly, SoilMoist only has a layer-specific version (unless SoilWet serves as the total, though it seems like a different measurement). 5) See the google sheet comments by each variable for more specific questions or conflicts.

annethomas commented 7 years ago

@mdietze Can you clarify the difference between the goal/outcome of the changes we've been talking about and @dlebauer 's suggestion of populating the standard names field with the table here from 2014-2016: https://docs.google.com/spreadsheets/d/1oEiDasdTslsm0VXFPWAUKS82BYbzXLyv2CYiicXMEys/edit#gid=0 ? Are we just wanting to do a more thorough overhaul of names?

mdietze commented 7 years ago

standard_name is about reconciling BETY names with names from other standards, most often CF. My requests have all been about reconciling BETY names with PEcAn output names, as are primarily focused on how load_data works. I strongly oppose using the standard name field to solve both problems -- right now we've got genuine cases of 3-way conflicts and standard name can only resolve one of these. I've been saying very consistently for over a year that in load_data the precedence is PEcAn output standard > BETY variable name > CF. @bcow and @mccabete created a look-up table to resolve conflicts between PEcAn output & BETY, and as we go we need to resolve PEcAn / BETY conflicts in names & units (and the sooner we do this the better) so we can eventually deprecate the look-up table. variables$standard_name resolves conflicts between BETY & CF.

annethomas commented 7 years ago

Ok thanks. @mdietze @dlebauer @mccabete Here's my understanding for the context of this issue, see if it makes sense: So the table I started in the link at the top has a) variables that don't exist at all in either bety or pecan (a lot of them arising from the hierarchical pools framework @mccabete is putting together #1442 ), like overall litter_carbon_content , and b) bety variables that we want to add to the pecan pecan standard (currently for the sake of standard IC inputs) but question the names/units, like LeafLitter and Microbial Biomass C. And as long as we're adding variables to the pecan output standard we thought we might change the names to CF style. Right? But then there's the broader issue of conflicts between existing output standard names and bety, which I know less about. Also @dlebauer it looks like the BETYdb to CF table and the one I've made (link in description) are mostly complementary; the only overlapping variables are soilN/soilP and SOC. The rest of the ones in my table as it stands are finer carbon pools and fluxes like litter and woody debris.

mdietze commented 7 years ago

@annethomas I think that summary make sense. Here's some feedback on some specific variables:

[surface/subsurface]_litter_carbon_content vs. [leaf/fine_root/fine_wood]_litter_carbon_content: there's a bit of ambiguity here -- one set of variables is defined by origin and the other by location, and currently the proposed hierarchy doesn't disambiguate these, which would leave most users pretty confused. For me I could see leaf litter and fine woody debris as nested within surface litter in the hierarchy, but they wouldn't necessarily sum to 100% (e.g. there's also reproductive litter). The next question would be whether subsurface and fine root litter are synonyms, or whether there are other subsurface litter components we're missing.

[fast/slow/structural]_soil_pool_carbon_content: Unlike @mccabete, I personally don't find the proposed addition of structural to be a conflict with CF, but rather just an extension. None of these are 'real' but they're pretty standard in CENTURY-style models so they'll be helpful output pool names as long as such models continue to be used. As we diversify the models considered we may need to acknowledge alternative hierarchies for SOC.

soil[N/P] vs soil_[nitrogen/phosphorus]_concentration: As proposed, the latter should be content, not concentration, same as with the soil C variables, and I personally don't see these as equivalent because of the units difference (per kg vs per m2, requires knowing bulk density and depth to move from one to the other). I think we can add the new variables without changing the old ones or conflicting with @dlebauer's proposed mass_fraction_of_nitrogen_in_soil (which is equivalent to soilN)

annethomas commented 7 years ago

@mdietze Thanks for the feedback.

litter: Is it useful to have specific pools for things like reproductive litter (would models use this) or is there a way to lump things so we could safely say it sums to 100%, like other_litter? I can see how fine root and subsurface litter could be synonymous, although I was wondering how you would distinguish subsurface/root litter from soil/living roots in the field. Especially since e.g. FIA and NEON don't collect that, just defining litter as surface materials. Do models use it though? @mccabete http://data.neonscience.org/api/v0/documents/NEON_litterfall_userGuide_vA

nitrogen/phosphorus: To clarify, we want to add soil_[nitrogen/phosphorus]_content in kg/m2 as new variables and leave soilN/soilP as is with @dlebauer 's proposed standard names?

mdietze commented 7 years ago

litter: I don't think having 'reproductive_litter' and 'other_litter' are mutually exclusive. Some models do have explicit seed production and dispersal, and that sort of data is not uncommon (e.g. this is exactly what Hannah's doing right now in our lab), but it should be noted that reproductive litter wouldn't be just seeds, but would also include flowers, cones, pollen, etc. This bit is implicit in some models (e.g. in ED2 there's a large fraction of undifferentiated reproductive biomass that goes straight to litter without distinguishing type, and then the remainder determines the density of new seedlings that year).

living/dead fine roots: while not always done, many people are capable of making this distinction observationally, especially when working with minirhizotron images. But perhaps our hierarchy of pools should acknowledge that not every dataset will distinguish living fine roots and dead fine root litter? FWIW, NEON was originally going to measure this, but it got descoped due to budget cuts.

N/P: yes, we should add new content (pool size) variables as these are complementary, rather than redundant with, our current concentration variables.

annethomas commented 7 years ago

@mdietze Questions:

Is it still workable to only have 4 dim columns now that pft is a dimension?
Should all variables have time as a dimension or only a subset?
Should litter have a depth dimension, or only subsurface litter?
Should we include the hierarchical parent variable in description?
How to move forward with fine_root_litter vs subsurface? Should I only add subsurface for now?
Do we want fine_wood_litter_carbon_content in addition to fine_wood_debris...?
I'm following cf convention literally and ending up with "soil_carbon_content_of_soil_layer;" is the repeated "soil" ok?
Are nitrogen and phosphorus categorized as "Physical Variables" or "Other" or something else?

annethomas commented 7 years ago

Another biggie: mstmip_local.csv in utils/data contains a bunch of what seem to be ED-specific output variables (such as VegT). I've spotted at least one conflict with Bety (LeafC, diff units) but that one didn't actually show up in the ED model2netcdf. Either way it may be risky to just add it to the standard. Do we want to make decisions on a variable-by-variable basis?

annethomas commented 7 years ago

It's fine
yes
no; model-dependent (just include depth, note optional)
Add column with parent
Leave subsurface, can resolve later if other pools show up (change DALEC)
Only have wood_debris and add size dimension (wdsize--both woody debris and fire fuel)
Soil_carbon_content in Variable.Name, soil_carbon_content_of_soil_layer in standard_name
Nutrient Pools or their own pools
Deprecating variables to be renamed

annethomas commented 7 years ago

@mdietze Tcan doesn't seemed to be used in any model2netcdf or anywhere in the code (I checked some tables but feel like I need some psql refeshers to dig deeper). I can't find any easily accessible info about ED2's VegT except that it's AVG_VEG_TEMP in the model. Which one should we go with?

mdietze commented 7 years ago

I'm fairly sure that ED2's AVG_VEG_TEMP is the leaf temperature and thus Tcan and VegT are equivalent

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.

PecanProject / pecan

Solidifying output standard #1496

Description

Notes