gbif / data-mobilization

For capturing and discussing potential datasets suitable for publishing to GBIF
Apache License 2.0
12 stars 2 forks source link

The RAINBIO mega database #16

Open gbif-portal opened 7 years ago

gbif-portal commented 7 years ago

The RAINBIO mega database

Dataset link: http://rainbio.cesab.org/#dataset

Region: Tropical Africa

Taxon: Tracheophyta (Vascular Plants)

Type: occurrence

Why is this important: Compiled using a many step workflow with cleaning, standardizing and quality checks through computer algorithms and expert knowledge. All species have their growth form/habit ascribed and all records are geo-referenced. RAINBIO is a compilation of thirteen datasets of three kinds: (i) extensive ‘public’ databases of several herbaria institutes (BR, BRLU, K, LISC, MO, and WAG (incl. AMD, L & U as well); acronyms according to Thiers (continuously updated), (ii) personal databases collated by individual researchers (focusing on a given taxonomic group or a given geographic area) and (iii) other sources of plant occurrences such as silica-gel collections or vegetation plot inventories. The WAG dataset includes a series of personal datasets (like ii) compiled for taxonomical revisions of over 35 genera in different families. Occurrences are thus supported by specimens deposited in herbaria (586,920 records), silica-dried specimens (13,510 records) or observations from plot inventories (13,443 records).

Priority: medium

Bibliographic reference: PhytoKeys 74: 1-18 (07 Nov 2016); https://doi.org/10.3897/phytokeys.74.9723

Comments: GBIF could target the RAINBIO species list to contribute to the backbone taxonomy in terms of habit attributes, valid species, etc. Individual/private datasets should be targeted for publication to GBIF. Cleaned data records that originated from GBIF can be republished as a reference dataset with feedback to the original data publishers of the data records to help curate their data.

Dataholders contact information: Thomas Couvreur: thomas.couvreur@ird.fr or cesab@fondationbiodiversite.fr

Users contact info: smasinde@gbif.org

dschigel commented 7 years ago

Thanks, Siro, this is a very important one.

rdmpage commented 7 years ago

I'm trying to figure out how much overlap there is with existing GBIF data. Not entirely straightforward, but some obvious duplicates, e.g.:

"603122",771353,"Magnoliophyta","Lamiales","Acanthaceae","Acanthopale","laxiflora","ESP","Acanthopale laxiflora","Acanthopale laxiflora","Tanzania","TZA","TZA",-1.05,31.55,"herb","","TS1540804","MO","",5,"f","herb","PreservedSpecimen","Acanthopale laxiflora","Acanthaceae"

Is the same as http://www.gbif.org/occurrence/1258769203 (urn:catalog:MO:Tropicos:1540804). Could encode plant form (e.g., "herb") if there was a standard ontology for these things, seemingly there isn't(!).

Once again, we need tools to cluster occurrences that are the same...

rdmpage commented 7 years ago

EOL has some terms for plant habits which it uses in TraitBank, see e.g. http://eol.org/api/traits/647408

{
        "@id": "http://eol.org/pages/647408/data#data_point_15005931",
        "eol:traitUri": "http://eol.org/resources/814/measurements/m46588",
        "@type": "dwc:MeasurementOrFact",
        "predicate": "growth habit",
        "dwc:measurementType": "http://eol.org/schema/terms/PlantHabit",
        "value": "tree",
        "eol:dataPointId": 15005931,
        "dc:source": "http://collections.mnh.si.edu/search/botany/?irn=10497215",
        "dc:bibliographicCitation": "Smithsonian Institution, National Museum of Narutal History, Department of Botany. Data for specimen 3314475. http://collections.mnh.si.edu/search/botany/",
        "dwc:measurementValue": "http://eol.org/schema/terms/tree",
        "dwc:scientificName": "Zygia heteroneura Barneby & J.W. Grimes",
        "dwc:catalogNumber": "3314475",
        "dwc:collectionCode": "http://grbio.org/cool/i75d-97nn",
        "dwc:institutionCode": "http://biocol.org/urn:lsid:biocol.org:col:81442",
        "dwc:measurementRemarks": "Source term: Tree",
        "eolterms:resource": "http://eol.org/resources/814"
}

Here is a translation table for values used in the a_habit field in the RAINBOW dataset.

a_habit term
aquatic
climber http://eol.org/schema/terms/climbingPlant
epiphyte http://eol.org/schema/terms/epiphyte
herb http://eol.org/schema/terms/forbHerb
liana http://eol.org/schema/terms/liana
myco-heterotroph
parasitic
shrublet
shrub http://eol.org/schema/terms/shrub
tree http://eol.org/schema/terms/tree
vine http://eol.org/schema/terms/vine

The a_habit data could be converted to the Darwin Core Measurement Or Facts extension (also used by EOL). In principle if GBIF supported searching by traits we could then search for, say, "tree species in Malawi".

mdoering commented 7 years ago

For plants there are well established vocabularies for life forms, notably the ones based on the system originally introduced by Raunkiær: https://en.wikipedia.org/wiki/Raunki%C3%A6r_plant_life-form

The species profile extension which GBIF indexes today already has a lifeForm term that can hold such values as tree (or Phanerophytes): http://rs.gbif.org/extension/gbif/1.0/speciesprofile.xml

rdmpage commented 7 years ago

Although Raunkiær is simply a list of terms, not a vocabulary/ontology with URIs. And if EOL uses one set of terms, and GBIF use another we will struggle to integrate this stuff…

Sent from my iPhone

On Sat, Jan 7, 2017 at 10:04 PM +0000, "Markus Döring" notifications@github.com wrote:

For plants there are well established vocabularies for life forms, notably the ones based on the system originally introduced by Raunkiær: https://en.wikipedia.org/wiki/Raunki%C3%A6r_plant_life-form

The species profile extension which GBIF indexes today already has a lifeForm term that can hold such values as tree (or Phanerophytes): http://rs.gbif.org/extension/gbif/1.0/speciesprofile.xml

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

mdoering commented 7 years ago

EOL seems to be derived from https://plants.usda.gov/growth_habits_def.html - just a set of terms as any of these standards seems to be. I really would not bother about URIs. But I agree it would be good to converge on one standard for interoperability. As far as I know Raunkiær is the most used one: https://en.wikipedia.org/wiki/Plant_life-form

rdmpage commented 7 years ago

@mdoering

I really would not bother about URIs

Well, there goes the Semantic Web...

To be clear, although the Semantic Web has been something an exercise in wishful thinking for a while (and a huge, mostly wasteful time sink for biodiversity informatics), but I think we're likely to see some useful things start to happen in the near future, especially with the rise of http://schema.org and Wikidata. So, while maybe right now we can ignore URIs (there are so many and so few drivers to rationalise them), I think this is something we should still have on our radar.

P.S. yes, it feels weird to be defending the Semantic Web.