fdschneider / bexis_traits

developing a trait data framework for use in the Biodiversity Exploratories
0 stars 0 forks source link

`taxonID` and `scientificName` #3

Closed fdschneider closed 7 years ago

fdschneider commented 7 years ago

How to label the actual species/taxon name field ?

fdschneider commented 7 years ago

I thought that taxonID might be better, but it seems to be reserved for a alphanumerical identifier. In contrast, scientificName is what people would use to sort and filter their dataset by. Thus, I suggest scientificName to be the accepted species name from the lookup table (without author and year reference), and taxonID would be an identifier including the reference taxonomic system, which could be extracted by using the taxize package. e.g. urn:lsid:faunaeur.org:taxname:263190 or urn:lsid:catalogueoflife.org:taxon:11c22318-ac8f-11e3-805d-020044200006:col20150401

We need to decide, which reference system would be the most useful: see list of APIs available in taxize.

Fauna Europaea might be implemented in taxize in the future: ropensci/taxize#605.

aostrow commented 7 years ago

I agree, it should be scientificName. There is another good api/webservice, not in the taxsize package. It is the GBIF backbone. For me it looks really promising and for our daily work it is the most useful at the moment. You find it here: http://www.gbif.org/developer/species They provide a couple of nice ways to search for species, e.g.

fdschneider commented 7 years ago

Actually, it seems GBIF is implemented already, but missing from the README table. There is the get_gbifid() function and several other matching functions. I will explore now how to match user names to GBIF names using the taxize package. Not sure how it handles synonyms, yet.

fdschneider commented 7 years ago

I switched back to scientificName.

taxonID is now provided according to GBIF backbone taxonomy, but alternative taxonomies should be allowed for. The format of the taxonID now is GBIF Backbone Taxonomy::1708251. Is there a format standard defined within BExIS?

See also Issue #2 and function get_gbif_taxonomy().

aostrow commented 7 years ago

Format standard for what? (1) Which taxonomies should be used OR (2) how looks the taxonID string?

(1) For BExIS a student helper is going through the datasets and tries to map the used species related term to a taxonomic representation. The student is using the web search for it becuase very often the species name in our data are written very cryptic. We started to use Catalog of Live but are currently more tending to use GBIF. It seems that more species are available and up to date here. If there a species is not present (or not find-able) the student helper tries other tax. backbones. We also try to add German and English names, often by using wikipedia. (2) No, it depends how this IDs are shown on the browser. For GBIF the student use the following format: GBIF:speciesKey:4436770.

fdschneider commented 7 years ago

Thanks for the clarification. I mainly meant (2). I will just keep the full string including 'GBIF Backbone Taxonomy' (since this is what taxize returns) and use a single : plus gbifID (this can refer to a higher taxon as well, according to the taxon rank provided). which should allow an easy handling and unambiguous interpretation of the full string.

And regarding (1), I will teach the script to fall back to another taxonomy, if GBIF fails. The taxize package is very useful here. In any case, we will have the user-provided name.