Open jam2767 opened 9 years ago
@jam2767 have you reviewed the guidelines for creating CF standard names?
In summary:
The general framework for constructing names (from ess.stackexchange):
[surface] [component] standard_name [at surface] [in medium] [due to process] [assuming condition]
where
tropopause
, or sea_level
.
tropopause_stuff
stuff_at_sea_level
upward
, northward
, x
, etc. due_to_longwave_heating
assuming_clear_sky
Other conventions: there are a number of further conventions provided to standardize meaning, e.g. in the construction of standard_names when transformations, special phrases, chemical species, or generic names are used. For example,
ratio_of_X_to_Y
implies a specific form for the units (in this case [x]/[Y]
.amount
implies kg/m2
, area
implies m2
, etc.nitrate
, carbon_dioxide
water
refers to H$_2$O in all phases; wrt
= with respect to, toa
= top of atmosphere, etc.@dlebauer I hadn't seen that particular page (thanks!), but had looked at the CF Convention documentation v1.6, section 3.3 that discusses standard names. My understanding of the CF standard is that surface, component, medium, process, and assuming condition are all qualifiers to the base standard name. I think these qualifiers may be useful for some variables, but not necessary for others. I think this logic is already followed in the CF standard names. There's a bit of text from the FAQs on the topic:
"How detailed should a standard name be?
This depends on the application -- there can be standard names for very narrowly defined quantities, and standard names for broad concepts. The appropriate choice depends on which distinctions need to be made to decide whether another quantity is comparable to the one being defined.
Of course, this broad guideline could result in extraordinarily detailed standard names that will rarely be useful to others. Because the goal of standard names is to encourage interoperability, there are several qualifier types that are actively discouraged."
We'll have to be sure not to fall into the trap of creating standard names that won't be useful (or used!) by others. For instance, there may be some variables where a qualifier should be an entirely separate variable. DBH is one example of this, where several other databases have a variable for the DBH measurement and another variable for the height of the measurement. This seems like a better solution than having many very similar variables for the same measurement: tree_stem_diameter_at_1.3_meters, tree_stem_diameter_at_1.4_meters, tree_stem_diameter_at_1.37_meters, or whatever height diameters need to be measured when trees have buttresses, etc.
I also think it will be most useful to leverage the variable names already used in the field when possible so we do not end up with the xkcd standards problem
Indeed, the height at which DBH is measured could be stored as a covariate (with variable name "height", as we do with "depth" for roots.
On Mon, Apr 13, 2015 at 11:59 AM, jam2767 notifications@github.com wrote:
@dlebauer https://github.com/dlebauer I hadn't seen that particular page (thanks!), but had looked at the http://cfconventions.org/latest.html CF Convention documentation http://cfconventions.org/latest.html v1.6, section 3.3 that discusses standard names. My understanding of the CF standard is that surface, component, medium, process, and assuming condition are all qualifiers to the base standard name. I think these qualifiers may be useful for some variables, but not necessary for others. I think this logic is already followed in the CF standard names. There's a bit of text from the FAQs on the topic:
"How detailed should a standard name be?
This depends on the application -- there can be standard names for very narrowly defined quantities, and standard names for broad concepts. The appropriate choice depends on which distinctions need to be made to decide whether another quantity is comparable to the one being defined.
Of course, this broad guideline could result in extraordinarily detailed standard names that will rarely be useful to others. Because the goal of standard names is to encourage interoperability, there are several qualifier types that are actively discouraged."
We'll have to be sure not to fall into the trap of creating standard names that won't be useful (or used!) by others. For instance, there may be some variables where a qualifier should be an entirely separate variable. DBH is one example of this, where several other databases have a variable for the DBH measurement and another variable for the height of the measurement. This seems like a better solution than having many very similar variables for the same measurement: tree_stem_diameter_at_1.3_meters, tree_stem_diameter_at_1.4_meters, tree_stem_diameter_at_1.37_meters, or whatever height diameters need to be measured when trees have buttresses, etc.
I also think it will be most useful to leverage the variable names already used in the field when possible so we do not end up with the xkcd standards problem https://xkcd.com/927/
— Reply to this email directly or view it on GitHub https://github.com/PecanProject/pecan/issues/413#issuecomment-92410788.
This issue is stale because it has been open 365 days with no activity.
@mdietze @robkooper @dlebauer I've begun a table of proposed vegetation variable standard names and units, along with the names and units of the same variables within different vegetation databases and networks. This is a first pass at a few key variables and I will continue to add more variables and other databases so we have a robust list of variables and units. I will also be adding a column that lists what table variables are coming from when the product is a relational database.
veg standard: https://docs.google.com/spreadsheets/d/14frcv_URFKGmPwQyRqKk2hnHCoGCmKtWN05S3IxwlpE/edit?usp=sharing
I've tried to stick closely to the netCDF CF standards philosophy (Principles of CF include self-describing data (no external tables needed for understanding); metadata equally readable by humans and software; minimum redundancy and maximum simplicity; and a development process focusing on existing needs.) when naming variables and have also chosen the same format for variables of lowercase letters and "_" delimeters (e.g. example_variable). I've also tried to avoid coming up with completely different variable names from what other networks are already using, but some variables we are definitely interested in, (e.g. plant functional type) are not present in many of the databases I've looked at so far.
Mike brought up the need for a written logic to the standard we choose and the fact that there is going to be a lot of variability in the names people have given these variables, much more so than the met variables, and that Rob likely knows more about the ways to deal with that than we do.