Vegetation data standard

PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

www.pecanproject.org

Other

202 stars 234 forks source link

Vegetation data standard #413

Open jam2767 opened 9 years ago

jam2767 commented 9 years ago

@mdietze @robkooper @dlebauer I've begun a table of proposed vegetation variable standard names and units, along with the names and units of the same variables within different vegetation databases and networks. This is a first pass at a few key variables and I will continue to add more variables and other databases so we have a robust list of variables and units. I will also be adding a column that lists what table variables are coming from when the product is a relational database.

veg standard: https://docs.google.com/spreadsheets/d/14frcv_URFKGmPwQyRqKk2hnHCoGCmKtWN05S3IxwlpE/edit?usp=sharing

I've tried to stick closely to the netCDF CF standards philosophy (Principles of CF include self-describing data (no external tables needed for understanding); metadata equally readable by humans and software; minimum redundancy and maximum simplicity; and a development process focusing on existing needs.) when naming variables and have also chosen the same format for variables of lowercase letters and "_" delimeters (e.g. example_variable). I've also tried to avoid coming up with completely different variable names from what other networks are already using, but some variables we are definitely interested in, (e.g. plant functional type) are not present in many of the databases I've looked at so far.

Mike brought up the need for a written logic to the standard we choose and the fact that there is going to be a lot of variability in the names people have given these variables, much more so than the met variables, and that Rob likely knows more about the ways to deal with that than we do.

dlebauer commented 9 years ago

@jam2767 have you reviewed the guidelines for creating CF standard names?

In summary:

The general framework for constructing names (from ess.stackexchange):

[surface] [component] standard_name [at surface] [in medium] [due to process] [assuming condition]

where

surface is a horizontal plane, e.g. tropopause, or sea_level.
- Single word surface names go at the beginning; surfaces with more than one name goes after the standard_name
- tropopause_stuff
- stuff_at_sea_level
component is a vector, e.g. upward, northward, x, etc.
medium is, e.g. in_water, in_soil, etc.
process starts with "due_to", e.g. due_to_longwave_heating
assuming condition is a conditional assumption, e.g. assuming_clear_sky

Other conventions: there are a number of further conventions provided to standardize meaning, e.g. in the construction of standard_names when transformations, special phrases, chemical species, or generic names are used. For example,

transformations imply a particular form of units, e.g ratio_of_X_to_Y implies a specific form for the units (in this case [x]/[Y].
generic names imply specific units, e.g. amount implies kg/m2, area implies m2, etc.
there are standard spellings for chemical species, e.g. nitrate, carbon_dioxide
phrases: water refers to H$_2$O in all phases; wrt = with respect to, toa = top of atmosphere, etc.

jam2767 commented 9 years ago

@dlebauer I hadn't seen that particular page (thanks!), but had looked at the CF Convention documentation v1.6, section 3.3 that discusses standard names. My understanding of the CF standard is that surface, component, medium, process, and assuming condition are all qualifiers to the base standard name. I think these qualifiers may be useful for some variables, but not necessary for others. I think this logic is already followed in the CF standard names. There's a bit of text from the FAQs on the topic:

"How detailed should a standard name be?

This depends on the application -- there can be standard names for very narrowly defined quantities, and standard names for broad concepts. The appropriate choice depends on which distinctions need to be made to decide whether another quantity is comparable to the one being defined.

Of course, this broad guideline could result in extraordinarily detailed standard names that will rarely be useful to others. Because the goal of standard names is to encourage interoperability, there are several qualifier types that are actively discouraged."

We'll have to be sure not to fall into the trap of creating standard names that won't be useful (or used!) by others. For instance, there may be some variables where a qualifier should be an entirely separate variable. DBH is one example of this, where several other databases have a variable for the DBH measurement and another variable for the height of the measurement. This seems like a better solution than having many very similar variables for the same measurement: tree_stem_diameter_at_1.3_meters, tree_stem_diameter_at_1.4_meters, tree_stem_diameter_at_1.37_meters, or whatever height diameters need to be measured when trees have buttresses, etc.

I also think it will be most useful to leverage the variable names already used in the field when possible so we do not end up with the xkcd standards problem

dlebauer commented 9 years ago

Indeed, the height at which DBH is measured could be stored as a covariate (with variable name "height", as we do with "depth" for roots.

On Mon, Apr 13, 2015 at 11:59 AM, jam2767 notifications@github.com wrote:

@dlebauer https://github.com/dlebauer I hadn't seen that particular page (thanks!), but had looked at the http://cfconventions.org/latest.html CF Convention documentation http://cfconventions.org/latest.html v1.6, section 3.3 that discusses standard names. My understanding of the CF standard is that surface, component, medium, process, and assuming condition are all qualifiers to the base standard name. I think these qualifiers may be useful for some variables, but not necessary for others. I think this logic is already followed in the CF standard names. There's a bit of text from the FAQs on the topic:

"How detailed should a standard name be?

This depends on the application -- there can be standard names for very narrowly defined quantities, and standard names for broad concepts. The appropriate choice depends on which distinctions need to be made to decide whether another quantity is comparable to the one being defined.

Of course, this broad guideline could result in extraordinarily detailed standard names that will rarely be useful to others. Because the goal of standard names is to encourage interoperability, there are several qualifier types that are actively discouraged."

We'll have to be sure not to fall into the trap of creating standard names that won't be useful (or used!) by others. For instance, there may be some variables where a qualifier should be an entirely separate variable. DBH is one example of this, where several other databases have a variable for the DBH measurement and another variable for the height of the measurement. This seems like a better solution than having many very similar variables for the same measurement: tree_stem_diameter_at_1.3_meters, tree_stem_diameter_at_1.4_meters, tree_stem_diameter_at_1.37_meters, or whatever height diameters need to be measured when trees have buttresses, etc.

I also think it will be most useful to leverage the variable names already used in the field when possible so we do not end up with the xkcd standards problem https://xkcd.com/927/

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/pecan/issues/413#issuecomment-92410788.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 365 days with no activity.