Open gbif-portal opened 7 years ago
Thanks, Siro, this is a very important one.
I'm trying to figure out how much overlap there is with existing GBIF data. Not entirely straightforward, but some obvious duplicates, e.g.:
"603122",771353,"Magnoliophyta","Lamiales","Acanthaceae","Acanthopale","laxiflora","ESP","Acanthopale laxiflora","Acanthopale laxiflora","Tanzania","TZA","TZA",-1.05,31.55,"herb","","TS1540804","MO","",5,"f","herb","PreservedSpecimen","Acanthopale laxiflora","Acanthaceae"
Is the same as http://www.gbif.org/occurrence/1258769203 (urn:catalog:MO:Tropicos:1540804). Could encode plant form (e.g., "herb") if there was a standard ontology for these things, seemingly there isn't(!).
Once again, we need tools to cluster occurrences that are the same...
EOL has some terms for plant habits which it uses in TraitBank, see e.g. http://eol.org/api/traits/647408
{
"@id": "http://eol.org/pages/647408/data#data_point_15005931",
"eol:traitUri": "http://eol.org/resources/814/measurements/m46588",
"@type": "dwc:MeasurementOrFact",
"predicate": "growth habit",
"dwc:measurementType": "http://eol.org/schema/terms/PlantHabit",
"value": "tree",
"eol:dataPointId": 15005931,
"dc:source": "http://collections.mnh.si.edu/search/botany/?irn=10497215",
"dc:bibliographicCitation": "Smithsonian Institution, National Museum of Narutal History, Department of Botany. Data for specimen 3314475. http://collections.mnh.si.edu/search/botany/",
"dwc:measurementValue": "http://eol.org/schema/terms/tree",
"dwc:scientificName": "Zygia heteroneura Barneby & J.W. Grimes",
"dwc:catalogNumber": "3314475",
"dwc:collectionCode": "http://grbio.org/cool/i75d-97nn",
"dwc:institutionCode": "http://biocol.org/urn:lsid:biocol.org:col:81442",
"dwc:measurementRemarks": "Source term: Tree",
"eolterms:resource": "http://eol.org/resources/814"
}
Here is a translation table for values used in the a_habit field in the RAINBOW dataset.
a_habit | term |
---|---|
aquatic | |
climber | http://eol.org/schema/terms/climbingPlant |
epiphyte | http://eol.org/schema/terms/epiphyte |
herb | http://eol.org/schema/terms/forbHerb |
liana | http://eol.org/schema/terms/liana |
myco-heterotroph | |
parasitic | |
shrublet | |
shrub | http://eol.org/schema/terms/shrub |
tree | http://eol.org/schema/terms/tree |
vine | http://eol.org/schema/terms/vine |
The a_habit data could be converted to the Darwin Core Measurement Or Facts extension (also used by EOL). In principle if GBIF supported searching by traits we could then search for, say, "tree species in Malawi".
For plants there are well established vocabularies for life forms, notably the ones based on the system originally introduced by Raunkiær: https://en.wikipedia.org/wiki/Raunki%C3%A6r_plant_life-form
The species profile extension which GBIF indexes today already has a lifeForm term that can hold such values as tree (or Phanerophytes): http://rs.gbif.org/extension/gbif/1.0/speciesprofile.xml
Although Raunkiær is simply a list of terms, not a vocabulary/ontology with URIs. And if EOL uses one set of terms, and GBIF use another we will struggle to integrate this stuff…
Sent from my iPhone
On Sat, Jan 7, 2017 at 10:04 PM +0000, "Markus Döring" notifications@github.com wrote:
For plants there are well established vocabularies for life forms, notably the ones based on the system originally introduced by Raunkiær: https://en.wikipedia.org/wiki/Raunki%C3%A6r_plant_life-form
The species profile extension which GBIF indexes today already has a lifeForm term that can hold such values as tree (or Phanerophytes): http://rs.gbif.org/extension/gbif/1.0/speciesprofile.xml
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
EOL seems to be derived from https://plants.usda.gov/growth_habits_def.html - just a set of terms as any of these standards seems to be. I really would not bother about URIs. But I agree it would be good to converge on one standard for interoperability. As far as I know Raunkiær is the most used one: https://en.wikipedia.org/wiki/Plant_life-form
@mdoering
I really would not bother about URIs
Well, there goes the Semantic Web...
To be clear, although the Semantic Web has been something an exercise in wishful thinking for a while (and a huge, mostly wasteful time sink for biodiversity informatics), but I think we're likely to see some useful things start to happen in the near future, especially with the rise of http://schema.org and Wikidata. So, while maybe right now we can ignore URIs (there are so many and so few drivers to rationalise them), I think this is something we should still have on our radar.
P.S. yes, it feels weird to be defending the Semantic Web.
The RAINBIO mega database
Dataset link: http://rainbio.cesab.org/#dataset
Region: Tropical Africa
Taxon: Tracheophyta (Vascular Plants)
Type: occurrence
Why is this important: Compiled using a many step workflow with cleaning, standardizing and quality checks through computer algorithms and expert knowledge. All species have their growth form/habit ascribed and all records are geo-referenced. RAINBIO is a compilation of thirteen datasets of three kinds: (i) extensive ‘public’ databases of several herbaria institutes (BR, BRLU, K, LISC, MO, and WAG (incl. AMD, L & U as well); acronyms according to Thiers (continuously updated), (ii) personal databases collated by individual researchers (focusing on a given taxonomic group or a given geographic area) and (iii) other sources of plant occurrences such as silica-gel collections or vegetation plot inventories. The WAG dataset includes a series of personal datasets (like ii) compiled for taxonomical revisions of over 35 genera in different families. Occurrences are thus supported by specimens deposited in herbaria (586,920 records), silica-dried specimens (13,510 records) or observations from plot inventories (13,443 records).
Priority: medium
Bibliographic reference: PhytoKeys 74: 1-18 (07 Nov 2016); https://doi.org/10.3897/phytokeys.74.9723
Comments: GBIF could target the RAINBIO species list to contribute to the backbone taxonomy in terms of habit attributes, valid species, etc. Individual/private datasets should be targeted for publication to GBIF. Cleaned data records that originated from GBIF can be republished as a reference dataset with feedback to the original data publishers of the data records to help curate their data.
Dataholders contact information: Thomas Couvreur: thomas.couvreur@ird.fr or cesab@fondationbiodiversite.fr
Users contact info: smasinde@gbif.org