fdschneider / bexis_traits

developing a trait data framework for use in the Biodiversity Exploratories
0 stars 0 forks source link

review categories of trait template #10

Closed fdschneider closed 6 years ago

fdschneider commented 7 years ago

Does the current structure of the template make sense? The template is currently split into six (seven) categories:

  1. Core traitdata columns
  2. Columns referring to Measurement or Fact
  3. Columns referring to specimen
  4. Columns referring to sampling event or origin of record
  5. Columns for aggregate measures

as a particular class of columns for the exploratories:

  1. Additional columns for use in Biodiversity Exploratories

and containing potential metainformation for data tables that are combined from multiple datasets (i.e. aggregated data)

  1. Metadata vocabulary
fdschneider commented 7 years ago

After looking into the TraitBank definitions of the EOL project, I suggest to organise the template categories after the type of data they refer to. From a database point of view, we have three layers of information:

  1. the core data: values of a measurement or fact (reporting value and unit) and a reference to an occurence (i.e. a single individual which is identified to a scientific taxon in most cases) and a measurementType (i.e. a trait definition). This standardised dataset is doubled by a dataset containing the original entries provided by the user (labelled _original)
  2. further detail on the measurement or fact (linked by measurementID): the method of measurement and sampling, and any other sources of bias (e.g. person measuring or identifying, environmental conditions), information about aggregate measures and how they were obtained, or literature references to an original source.
  3. further detail on the occurence (linked by occurenceID): the lower level detail on the individual (e.g. basis of record, sex, morphotype, etc.) and higher level taxonomic information (kingdom, taxonRank) as well as a georeference of the finding. This could also directly refer to a museum specimen in its respective database. This also includes the columns that are specific to the Biodiversity Exploratories.
  4. Metadata: the higher level information ,that apply to the entire dataset, such as authorship or bibliographic reference

furthermore, if no online ontologies are referenced, there can be user-specific tables that contain:

  1. further detail on the taxon (linked by taxonID): full taxonomic information and referenced literature
  2. further detail on the trait type (linked by measurementTypeID): trait name and trait definition, accepted factor levels and range of values, expected units, reference

This structure fits into separate data tables, as suggested by the EOL TraitBank standard, and can be combined into an Darwin Core Archive. E.g. one can refer to the information given in tables 2 and 3 by using the fields measurementID and occurenceID.

This atructure will be reflected in v0.3 (see d05233a6f02352d523be076077078aba64cde921 ; 751647a19777469195f058d89c4a2a4b88e09079).

I prepared an Excel-sheet (see 568c81ffa33956840c9dfed41240948d6ea78590) reflecting this structure. It could be used to manually prepare data for upload.

However, a minimal R-script version should provide a transfer of the user-provided data into standardised data, i.e. provide

fdschneider commented 7 years ago

I just updated the trait data standard / template definitions. See: http://fdschneider.de/bexis_traits/traitdatastandard.html

I was also working on the R package today and it now is quite functional. Check it out at https://github.com/fdschneider/traitdataform. A couple of bugs remain. I will provide a package vignette and some examples in the next couple of days.

nadjasimons commented 6 years ago

The current structure of the trait data standard is very useful and should be easy to understand for the user. I think this can be our final structure.

fdschneider commented 6 years ago

Ok. Thanks. I'll close this for now. If comments of our trait experts unravel flaws in this structure, we'll come back later.