fdschneider / bexis_traits

developing a trait data framework for use in the Biodiversity Exploratories
0 stars 0 forks source link

Proceedure for creating the trait list #18

Open nadjasimons opened 7 years ago

nadjasimons commented 7 years ago

Based on our discussions in the other issues, I suggest the following for the trait list:

  1. I combine all traits that I found in the different sources into one list and try to harmonize the trait names.
  2. I will put those traits into the BExIS_TraitList.csv template form where each trait has an Identifier, a name and is assigned to one trait category.
  3. The traits Identifier will be linked to a trait ontology which will have more than two levels (compare with T-SITA.

I am thinking of making the Identifier somewhat human readable. For example all morphological traits would start with 1, all measurements of body_size would start with 11, followed by body_length as 111 and body_width as 112, etc. and body_length_abdomen as 1121. I am wondering though if this system will be flexible enough and if there are enough numbers to do this. We would probably have to assign more than one digit to the lowest level, but could then only have nine groups on each higher level. What are your thoughts on that?

fdschneider commented 7 years ago

It would be great to have a logic for the numbering, but I guess any scheme is too constrained and will break at some point in the future as further traits are added to the list. Also, the categories or classifications sometimes depend very much on the research question and might confuse researchers of a very different background.

If we manage to upload our trait list to a public website, we could provide globally valid URIs as traitID, of the scheme "https://www.bexis.uni-jena.de/arthropodtraits.html#body_size" instead of numeric IDs which are only locally valid. The TSita traits then could receive their original URI as traitID, e.g. 'http://t-sita.cesab.org/BETSI_vizInfo.jsp?trait=Wing_surface'.
Or we could keep our own identifier and have another column 'Refines' or 'Inherits_from' or 'Equals' linking the term to its original definition in an existing ontology.

Such a URI-based traitID scheme is kind of human readable, but too complicated to be entered manually. The users would then rely on the measurementType field, which contains something like 'body_size'.

For the hierarchies a more flexible scheme could be to add columns which link each trait to the next broader term (parent term) or narrower terms (child terms). That is what T-SITA already provides and we could just adopt this. Then, there is no limit to the hierarchical depth of terms.

fdschneider commented 7 years ago

Concerning the trait list we provide, In the whitepaper I would put it that way:

In addition to the terms for trait datasets, we propose a vocabulary of terms for trait lists / lookup tables of trait definitions / a trait thesaurus (http://fdschneider.de/bexis_traits/traitdatastandard.html#terms-for-traitlists-a-trait-thesaurus). If trait data providers publish trait definitions along with their data using these terms, or refer to existing ontologies wherever possible, this will help building a decentralised semantic network of trait definitions. Each trait measurement can then be referred to broader or narrower trait definitions (some further explanations and examples could go here). As a case example and starting point, we provide a trait list for arthropod traits that have been used within the projects of the Biodiversity Exploratories. This list defines about 150 functional traits and provides a unique identifier for each.

That said, we are free to publish a 'incomplete' list without using a defined hierarchy, but providing a semantic framework for building such hierarchies. (see #1)

The field Refines would then link to T-Sita. The field narrowerTerm and broaderTerm could link to other trait definitions in our list or other ontologies that are more or less general.

nadjasimons commented 6 years ago

Regarding the TraitID: I think it would be a good idea to have globally valid URI which links back to our trait Thesaurus. If I understood the answer from GFBio correctly ( #19 ), those URIs would link to their Terminology Server. In addition to this TraitID, I would include an additional field with the source URI. This can then link to the T-SITA URI or any other thesaurus. Having this additional information would make it possible to add traits to our thesaurus over time.

Regarding the hierarchy: A flexible hierarchy is probably best. The field broaderTerm would link any trait to the next higher level and allow navigation up to the hierarchy. However, I am not sure how this would work with a narrowerTerm field, as those could potentially include several terms.

Regarding the completeness of the trait list: I also think that we can start with the arthropod trait list for the whitepaper. The list 'traitlist_arthropods.csv' currently includes T-SITA traits and the additional traits from our survey. I think it should be feasible to complete this list for publication with the whitepaper, even if there is no automatic way to update it.

fdschneider commented 6 years ago

I should probably check back with GFBio Terminology Server what would be the best in terms of semantic web standards. @aostrow: maybe you can comment on this?

I think we can avoid duplicating the definitions of T-SITA traits. We should not just copy their definitions in our ontology, since they did not publish it as commons. Their ontology is accessible online and we should link to it directly.

That is, for our list, the traits that are available in T-SITA should just contain the three fields: traitName (which can follow our own naming scheme), traitID (containing the T-SITA URI) and traitUnit (since this is important for the R-package to function and I can't extract it from T-Sita directly). Our reference then directly tunnels anyone looking up the trait to T-SITA. The Definition, broader and narrower terms must then be extracted from there. If in any case a trait needs to be defined slightly differently than in T-SITA, we should refer to the T-SITA URI as a related term.

For all additional traits used within BExIS and not occuring in T-SITA, we must provide our own Definition, URI (for now linking to a Github resource, similar to our current data standard. we should create an extra repo for that. We'll aim to move this to GFBio before publication), traitUnit, and the related, broader and narrower terms (narrower and related terms may contain multiple entries, separated by semicolon).