fdschneider / bexis_traits

developing a trait data framework for use in the Biodiversity Exploratories
0 stars 0 forks source link

GFBio involvement #19

Closed fdschneider closed 7 years ago

fdschneider commented 7 years ago

How should we proceed concerning the involvement of GFBio and the terminology server team? I believe the trait template could greatly benefit by being hosted on a proper server with API support. The end product would be more a 'traitdata standard' rather than a simple template.

I repost the e-mail draft including your recent comments that we would send to the GFBio team:

Dear GFBio team,

In an end-user workshop webinar with Ivo Kostadinov I was encouraged to write you about an initiative that we are currently developing within the Biodiversity Exploratories project and that could contribute to the GFBio terminology framework. We formed a special interest group about trait-data with the aim to develop a template or general data standard for functional traits uploaded to BExIS.

An important aspect of this standardisation is to link measurements to standardised traits as defined by a trait ontology (including expected units and factor levels, usually specific for an organism group) and to accepted taxonomic names (usually referring to a global taxonomic ontology, e.g. gbif). The template will take a single measurement or fact per row, i.e. one value assigned to a specimen and a trait, which may be linked to other measurements of the same specimen or to a spatial or methodological context via unique identifiers. This format allows for storing repeated measures and within-species variation, referencing single measurements to a museum ID or location and date of sampling. This scheme is also partially compatible with the definitions proposed by EOLs TraitBank for the creation of Darwin Core Archives of trait data. Obviously, such a standard could also be of general use for trait-data outside the Biodiversity Exploratories.

One of the products of our initiative so far is a list of column names accepted for trait data (based on Darwin Core), including detailed definitions. A second product will be a definition of a trait thesaurus for invertebrates (our main focal organism group for now) which we aim to make broad enough to also include other organism groups in the future. To facilitate the transfer of own data (e.g. in matrix format or compiled tables) into the standardised format, I am currently working on an R script (#Cat: package?) that will assist the transfer, e.g. by mapping column and trait names, by converting units or by harmonizing factor levels.

My first question would be if any other trait-data standards are being developed within the scope of GFBio or the participating data centers. We would invite anyone interested in working on trait-data to get in touch with us.

A second question is if the GFBio terminology server would be a place for hosting the ontology in a human- and machine-readable form (via the API). If so, what is required to make this happen? How could we start a collaboration on implementing this?

We are looking forward to hearing from you,

fdschneider commented 7 years ago

I got a response from Anton Güntsch of GFBio today:

Dear Florian,

we had a GFBio Terminology Service meeting last week with your request being one of the agenda points and I would like to pick up the thread again.

Regarding the publication of terminologies/glossaries compiled by your initiative: without knowing the details precisely I would say that the GFBio Terminology Server and Service would be the right place storing the glossaries and publishing them in a standardised and machine readable way. In this way, terms would also automatically be equipped with stable identifiers so that they could be reliably referenced and reused by systems such as the envisaged standardised templated. To discuss concrete steps in more detail it would be helpful to see some example glossary entries so that we can assess how a transformation for upload into the terminology server can be implemented. We would also probably need to discuss how to deal with corrections and versions etc.

Regarding the broader picture of trait data related initiatives: GFBio doesn’t have a trait-ontology working group at present but there are a lot of related initiatives with strong involvement of GFBio partners. I guess you are already in discussion with Jens Kattke (Try) who planned to work towards a trait ontology (when I last spoke to him). At the BGBM, we have started a DFG-Project “Additivity” which deals with the automation of taxon-level (plant) descriptions based on specimen-level characters/measurements. In the course of this project, we are developing an ontology, which is based on the Prometheus-model, which treats characters from (plant) structures separately. Once sufficiently stabilised, we are also planning to publish the ontology via GFBio Terminology Services and integrate it into our Platform for Cybertaxonomy (http://cybertaxonomy.org). Proper treatment of characters based on the TDWG SDD standard (Structure of Descriptive Data) is also an important aspect of the Diversity Workbench coordinated by Dagmar Triebel. Diversity Workbench is one of the important archiving infrastructures in GFBio.

I think it could be very interesting to bring German initiatives working in the field of standardisation and ontologies for trait/descriptive data together in a workshop and to try to identify potential for cooperation and alignment of products. What do you think?

On another note: you mentioned measurements linked to specimens via specimen IDs. There is a CETAF initiative for harmonising specimen identifiers based on http URIs (see http://www.nature.com/nature/journal/v546/n7656/full/546033d.html). Up to now, we have 16 CETAF organisations who implemented the system already and most of the “big players” are on board (Paris, NHM, Kew, MfN, Naturalis). It would be great if Senckenberg could join the initiative but I am not sure whom to approach best.

I am looking forward to discussing further cooperation possibilities!

With best wishes, Anton

fdschneider commented 7 years ago

Von: Florian Dirk Schneider [mailto:fd.schneider@senckenberg.de] Gesendet: Montag, 28. August 2017 12:56 An: Güntsch, Anton A.Guentsch@bgbm.org; Gleisberg, Maren M.Gleisberg@bgbm.org Cc: naouel.karam@fu-berlin.de; d.fichtmueller@bgbm.de; robert.lorenz@fu-berlin.de; Ivaylo Kostadinov ikostadi@mpi-bremen.de Betreff: Re: Trait Data Standard for Terminology Server

Dear Anton,

Thank you very much for the detailled reply and the interest. Our current version of the terminology is online at

https://github.com/EcologicalTraitData/TraitDataStandard

which contains the source file in CSV along with a html rendering (accessible at https://ecologicaltraitdata.github.io/TraitDataStandard/). Some terms map directly to Darwin core terms, others are refinements of those. We organised the Terminology into a core section, which defines how trait data should be stored, and some extensions that allow to add information on the occurence and measurement level.

The initiatives on trait data so far work on ontologies of trait definitions. The TOP Thesaurus, where Jens Kattge and TRY are involved, without a doubt is the most advanced project here. Our goal is to encourage and facilitate the use of the global IDs provided by these initiatives for the use in independent trait datasets (However, ontologies for most invertebrate groups are lacking to date). Same is true for identifiers on taxa and on individual specimens, as provided by CETAF (thanks for the pointer! I'll try to find out who would be the right person to talk about it at Senckenberg.). The SDD standard, after a quick look, seems to be well suited for measurement data taken on specimens. However, many trait-datasets (especially on animals) are not reporting direct measurements, but contain aggregate values or values extracted from literature or expert knowledge describing a trait at the species (or higher taxon) level. Our data standard should provide a template for storing and sharing these as well.

It definitely would be great to have a workshop to harmonize all these initiatives. But I should point out that our project is a low-capacity side-product of the Biodiversity Exploratories and right now is driven by myself. Although it has already raised quite some interest in my personal network of trait researchers. Unfortunately, my contract at Senckenberg ends in November and my goal is to have a terminology v1.0 online and submit a methods paper until then. Maren pointed me to https://terminologies.gfbio.org/ where it says you have a tool for transferring a table into compliant SW formats? That would be the next step then, I guess. I also would invite you to comment on one of the next versions of our methods paper.

The other product of our project is an R-package that formats trait data to be compliant with the traitdata standard, e.g. by providing unit conversion and automatic matching of taxon names on GBIF backbone taxonomy URIs (we are currently using the R package taxize to match given names to GBIF, but an integration of the EDIT API might be interesting as well). Under heavy development: https://github.com/fdschneider/traitdataform

I hope that this gave you a more precise overview of what I am working on.

Best whishes, Florian

fdschneider commented 7 years ago

Anton replied on 29.08.2017

Dear Florian,

Many thanks for your message and for pointing me to the GitHub-Site. From a first look it seems that importing and publishing the terms via the GFBio Terminology Services should be straight forward. We would probably need some additional conversation about the identity of concepts stemming from other standards.

I was not aware of the November deadline and would suggest that my colleague David Fichtmüller (in cc) will take care of the import once he will be back from paternity leave (mid of September). This should leave us enough time to bring the import to bed.

Regarding a broader workshop, I will talk to our descriptive data team here in Berlin. Let’s see.

Thanks again and best wishes, Anton

fdschneider commented 7 years ago

Just an update here. David Fichtmüller told me, that it will be possible to provide a static URI via GFBio Terminology service which forwards directly to our Github website and for programmatical requests to the ontology read-out of the term. I think that is the most feasible way that leaves us in control of the definitions.

I will send the manuscript now to David and Anton and invite them to comment and, given substantial contributions, get involved as co-authors.