PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 235 forks source link

Leaf spectra/trait variable names #665

Open ashiklom opened 8 years ago

ashiklom commented 8 years ago

In an attempt to standardize and merge multiple leaf spectra-trait databases, I've come up with the following list of variables to use. I can modify the variable names at any time, but earlier would be better since I'll have to do less finding and replacing.

# Qualitative identifiers
    "database",             # Database from which measurement originated (e.g. LOPEX, ANGERS, FFT)
    "sample_ID",            # Sample unique identifier (includes database, sample name and year)
    "sample_name",          # Sample name in original database
    "sample_year",          # Year in which sample was collected
    "species_code",         # Species code (USDA) or equivalent
    "species_scientific",   # Scientific (genus species) name
    "species_common",       # Common species name
    "MD",                   # Monocot or dicot
    "plant_type",           # Plant type -- broadleaf, conifer, shrub, grass, etc.
    "succession",           # Successional stage -- early, mid, late
    "PFT",                  # Functional type, defined as plant type x succession
    "site",                 # Site designation -- currently database-specific
    "plot",                 # Plot designation -- currently database-specific
    "canopy_position",      # Relative vertical canopy position (bottom, middle, top)
    "needle_age",           # Needle age (years) (1, 2, ...) (conifer only)
    "needle_oldnew",        # Is age greater than 1 year (old) or not (new)?
# Values
    "N",            # Leaf structure parameter from constrained PROSPECT inversion
    "C_a",          # Chlorophyll a concentration (ug cm-2)
    "C_b",          # Chlorophyll b concentration (ug cm-2)
    "C_ab",         # Total Chlorophyll concentration (ug cm-2)
    "C_car",        # Total Carotenoid concentraiton (ug cm-2)
    "C_anth",       # Total anthocyanin concentration (ug cm-2)
    "EWT",          # Equivalent water thickness (cm),
    "LMA",          # Leaf (wet) mass per unit area (g m-2)
    "LDMC",         # Leaf dry mass per unit area (g m-2)
    "LDMC_gcm",     # Leaf dry mass per unit area (g cm-2) (PROSPECT's 'Cm')
    "C_C",          # Carbon content (% dry weight)
    "C_O",          # Oxygen content (% dry weight)
    "C_H",          # Hydrogen content (% dry weight)
    "C_N",          # Nitrogen content (% dry weight)
    "CN_ratio",     # Carbon-Nitrogen ratio (C %DW / N %DW)
    "C_prot",       # Protein content (% dry weight)
    "C_cell",       # Cellulose content (% dry weight)
    "C_lign",       # Lignin content (% dry weight)
    "C_star",       # Starch content (% dry weight)
    "C_fib",        # Fiber content (% dry weight)
    "dN15"          # N15 isotope ratio
mdietze commented 8 years ago

What fraction of these are the names currently used in the BETY variables table? To the extent possible we should try and keep the 'standards' in the different parts of the project consistent.

ashiklom commented 8 years ago

Uh, to my knowledge, none of them, because I didn't check the BETY variables table before making this. I'm going to go ahead and do that...

dlebauer commented 8 years ago

Following from @mdietze's comment

  1. I will probably end up adding a cross-dictionary thesaurus to BETYdb at some point
  2. For many of these not in BETYdb that aren't currently used in practice, there would be little effect of changing the names in BETYdb
  3. One related source of variables (doesn't completely cover your list, but we could contribute additions) is the ICASA standard, which USDA and AgMIP are pushing see http://tinyurl.com/ICASA-MVL and http://research.agmip.org/display/dev/ICASA+Master+Variable+List, and terraref/reference-data#18
ashiklom commented 8 years ago

OK, @mdietze @dlebauer @serbinsh , I put together the following table comparing what I came up with and BETY's variables. What are your thoughts? Should I align everything that I can with BETY and then try to match the rest using similar names? Or are some of these deprecated, etc. and could use an overwrite/update?

Description My variable My unit BETY variable BETY unit
PROSPECT leaf structure N unitless -- --
Chlorophyll a content C_a ug cm-2 -- --
Chlorophyll b content C_b ug cm-2 -- --
Total chlorophyll concentration C_ab ug cm-2 Total_chlorophyll mg cm-2
Total carotenoid concentration C_car ug cm-2 Caro mg cm-2
Total anthocyanin concentration C_anth ug cm-2 -- --
Equivalent water thickness EWT g cm-2 (or cm) -- --
Mass per unit area LMA g m-2 LMA kg m-2
Dry mass per unit area LDMC g m-2 DM_green leaf g m-2
Carbon fraction (per dry mass) C_C g g-1 leafC-mass-fraction g g-1
Oxygen fraction (per dry mass) C_O g g-1 -- --
Hydrogen fraction (per dry mass) C_H g g-1 -- --
Nitrogen fraction (per dry mass) C_N g g-1 leafN --
Carbon-nitrogen ratio (based on mass fractions) CN_ratio unitless -- --
Protein fraction (per dry mass) C_prot g g-1 l_protein?? %
Cellulose fraction (per dry mass) C_cell g g-1 CelluloseL and l_cellulose % (both)
Lignin fraction (per dry mass) C_lign g g-1 l_lignin?? %
Starch fraction (per dry mass) C_star g g-1 leaf_starch and l_starch % (both)
Fiber fraction (per dry mass) C_fib g g-1 percent leaf fiber %
Nitrogen-15 isotope ratio dN15 unitless -- --
mdietze commented 8 years ago

Hmm. Not to create meaningless work, but it would be interesting to see what the CF standard is for some of these (as well as the other standard David mentioned). In general I'm in favor of aligning with BETY and then generating similar names for missing variables, however this points out that the naming in BETY has not been consistent. Thus I think we should discuss updating BETY names as well. Unless it conflicts with CF I kinda like the variables that are leaf[thing]. I definitely don't like the proposed C[thing] because for a number of these variables we have the same [thing] in stem wood, roots, etc. We should definitely be consistent on separators -- the spaces will cause lots of problems in R and I like _ over -. 'DM_green leaf' seems like a particularly bad name

ashiklom commented 8 years ago

I don't think it's meaningless work. I'd rather get it right the first time than have to go back later and change half the names or create unnecessary confusion. Given that all of these are going into a larger database of stuff, I totally agree that my C_ names are a bad idea -- they were just a first pass based on the names in the spectral databases

I'll check out the links Dave posted, but where can I find the CF standard?

mdietze commented 8 years ago

http://cfconventions.org/

ashiklom commented 8 years ago

So unless I'm missing something, neither the CF page nor the ICASA standards have anything useful at all about leaves:

A search of the CF variables table for "leaf" returns only "Leaf area index" and "leaf carbon content", and relevant searches for key terms in the other parameters similarly turn up little if anything at all. Same is true for ICASA, which has nothing on leaves and seems to have mostly soil, atmosphere, and aggregate vegetation stuff (e.g. biomass, harvest).

Unless the idea is that we try to use a similar format to theirs in creating our new variables?

ashiklom commented 8 years ago

How easy is it and/or recommended to just change records and delete variables in BETY? I poked around a bit more and came up with this table (Google Doc). I've highlighted my proposed changes in yellow, and proposed additions in green. It turns out that for most of these things, there are actually relatively few records in BETY, so I could easily change them all by hand in less than an hour of work. The ones with a lot of records (e.g. LMA, leafN), I can just leave as-is.

@mdietze @dlebauer @robkooper @serbinsh Thoughts?

mdietze commented 8 years ago

Easy but not always advisable. I know some of these changes will break the growth respiration code, but it's nothing a bit of grep can't fix. You'll also need to check the meta-analysis code. In addition, there's definitely a leaf C:N in there already that you missed, so don't add a new one. Finally, are the _area and _masspct endings the CF standard naming approach? If not, you'll want to use the CF standard for name generation.

Assuming these small issues are sorted out I give a thumbs up. You'll want that seconded by David before you make any changes

serbinsh commented 8 years ago

"I kinda like the variables that are leaf[thing]. I definitely don't like the proposed C[thing] because for a number of these variables we have the same [thing] in stem wood, roots, etc. "

I fully agree

serbinsh commented 8 years ago

@mdietze which of these will "break" the growth resp code? Doesn't seem like a lot of these are in BETY....is it the leaf chem vars, like starch, lig, cell?

A minor concern of mine is fiber, lignin, and cellulose as there isn't a standard def, plus there is hemi and non hemi cellulose, etc. Plus the values will vary depending on the use of acid digestion or HPLC. Maybe we can still pool them but how to discriminate? Also depending on what your standard is for the acid digestion approach (e.g. an aspen leaf standard) you may not get the same result, and some approaches will extract more sol and insoluble fractions that others.

Something to keep in mind.....so maybe the approach matters? As a separate variable? A covariate perhaps? @dlebauer @ashiklom

dlebauer commented 8 years ago

also worth looking at this collection of CF and CF-style names that I developed last year but never added to BETYdb https://goo.gl/uoQdHi

On Wed, Oct 28, 2015 at 4:05 PM, Shawn P. Serbin notifications@github.com wrote:

@mdietze https://github.com/mdietze which of these will "break" the growth resp code? Doesn't seem like a lot of these are in BETY....is it the leaf chem vars, like starch, lig, cell?

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/pecan/issues/665#issuecomment-151989938.

serbinsh commented 8 years ago

Hmmm @dlebauer so in that case you were using names like

CA_foliar mass_fraction_of_calcium_in_leaf

mdietze commented 8 years ago

for growth resp, yes it's those variables. you don't see them because that code hasn;t been incorporated into the mainline yet:

For 'trait' data there's definitely the capacity to add methods. It's a bit more tricky for data in other files. I think we'll have to rely on meta-data unless we want an explosion of names

ashiklom commented 8 years ago

So what's the balance (if any) of length to descriptiveness? Do we actually want fully CF-compliant names that are really long? E.g. leaf_chlorophyll_a_mass_per_unit_area and mass_fraction_leaf_protein?

mdietze commented 8 years ago

David, there's no argument about the fact that you're names are technically more correct by CF standard, but variables like mass_fraction_of_carbon_in_leaves are a mouthful. I think things like leaf_C or leaf_carbon are more practical, especially since the units are explicitly defined.

dlebauer commented 8 years ago

Note that the variables table has room for both CF style and short names On Wed, Oct 28, 2015 at 4:21 PM Michael Dietze notifications@github.com wrote:

David, there's no argument about the fact that you're names are technically more correct by CF standard, but variables like mass_fraction_of_carbon_in_leaves are a mouthful. I think things like leaf_C or leaf_carbon are more practical, especially since the units are explicitly defined.

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/pecan/issues/665#issuecomment-151993983.

dlebauer commented 8 years ago

And also that I am planning to add a thesaurus table On Wed, Oct 28, 2015 at 4:23 PM David LeBauer dlebauer@gmail.com wrote:

Note that the variables table has room for both CF style and short names On Wed, Oct 28, 2015 at 4:21 PM Michael Dietze notifications@github.com wrote:

David, there's no argument about the fact that you're names are technically more correct by CF standard, but variables like mass_fraction_of_carbon_in_leaves are a mouthful. I think things like leaf_C or leaf_carbon are more practical, especially since the units are explicitly defined.

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/pecan/issues/665#issuecomment-151993983 .

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.