OpenTreeOfLife / reference-taxonomy

Open Tree Reference Taxonomy (OTT) tools
BSD 2-Clause "Simplified" License
11 stars 12 forks source link

revisit hiding of fossil taxa #356

Open mtholder opened 6 years ago

mtholder commented 6 years ago

e.g. https://tree.opentreeoflife.org/taxonomy/browse?name=Aysheaia I am just creating this issue as a reminder to myself about reviewing the cause of the flags that cause a name to be unavailable for matching. This taxon is valid (see https://en.wikipedia.org/wiki/Aysheaia), and as we move to being better to handle fossils in the summary tree methods, it would be good to be more permissive about what names are matchable.

TonyRees commented 6 years ago

"genus Aysheaia irmng:1056114 barren, major_rank_conflict, hidden, extinct (OTT id 3583412)"

I can answer any IRMNG-related questions if you like. Maybe it is hidden because it is barren and comes only from IRMNG (not sure) as well as being extinct? But, a lot of "good" (i.e. valid) IRMNG genus names will be barren, as I only collect species names adventitiously (especially for fossil taxa); of course, IRMNG has a mix of valid and non-valid names, not all distinguished at this time.

You could maybe get the valid/synonym flags if you can get an up-to-date dump from PaleoBioDB, see here: http://fossilworks.org/bridge.pl?a=taxonInfo&taxon_no=18881

Although PBDB is not comprehensive at this time, a lot of the best known taxa will be in there (dinosaurs etc.). There is a 2016 PBDB dump available via GBIF as I recall (which should therefore be reflected in GBIF backbone I would have thought), not sure how you can get content directly from PBDB itself.

jar398 commented 6 years ago

On Jun 25, 2018, at 6:46 PM, TonyRees notifications@github.com wrote:

"genus Aysheaia irmng:1056114 barren, major_rank_conflict, hidden, extinct (OTT id 3583412)"

I can answer any IRMNG-related questions if you like. Maybe it is hidden because it is barren and comes only from IRMNG (not sure) as well as being extinct? But, a lot of "good" (i.e. valid) IRMNG genus names will be barren, as I only collect species names adventitiously (especially for fossil taxa); of course, IRMNG has a mix of valid and non-valid names, not all distinguished at this time.

You could maybe get the valid/synonym flags if you can get an up-to-date dump from PaleoBioDB, see here: http://fossilworks.org/bridge.pl?a=taxonInfo&taxon_no=18881

Although PBDB is not comprehensive at this time, a lot of the best known taxa will be in there (dinosaurs etc.). There is a 2016 PBDB dump available via GBIF as I recall (which should therefore be reflected in GBIF backbone I would have thought), not sure how you can get content directly from PBDB itself.

Correct, there is a PBDB dump at GBIF, and I considered using it. But when I checked it didn’t have a column for an extinct flag. I let Markus know that it would be nice if this information could be preserved, not just for PBDB but for all sources that have it. I suspect his to-do list has other things ahead of this one.

Jonathan

TonyRees commented 6 years ago

I just checked, there is an extension "SpeciesProfile" in the PBDB DwCA file available from GBIF that looks like this: - -

speciesprofile.txt

So, presuming "isExtinct" is populated, the information should be there. Alternatively there are values in "livingPeriod" which are of the form (e.g.) "513.0 to 498.5 Ma" from which fossil status could be inferred - if that is essential OTOL info. I was meaning to attempt a mapping between IRMNG and PBDB data at some point (mainly to grab valid vs. synonym status for some names, also child taxa at species level where not yet held, while I was still in my previous role at CSIRO, but unfortunately (early) retirement intervened :)

TonyRees commented 6 years ago

Sorry the gremlins did not display the text I wanted to show, here is the file content (meta.xml):

taxon.txt vernacularname.txt speciesprofile.txt
TonyRees commented 6 years ago

third try - angle brackets replaced with curlies:

{archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml"} {core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="0" rowType="http://rs.tdwg.org/dwc/terms/Taxon"} {files} {location}taxon.txt{/location} {/files} {id index="0" /} {field index="1" term="http://rs.tdwg.org/dwc/terms/acceptedNameUsageID"/} {field index="2" term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/} {field index="3" term="http://rs.tdwg.org/dwc/terms/taxonRank"/} {field index="4" term="http://rs.tdwg.org/dwc/terms/scientificName"/} {field index="5" term="http://rs.tdwg.org/dwc/terms/scientificNameAuthorship"/} {field index="6" term="http://rs.tdwg.org/dwc/terms/nameAccordingTo"/} {/core} {extension encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="0" rowType="http://rs.gbif.org/terms/1.0/VernacularName"} {files} {location}vernacularname.txt{/location} {/files} {coreid index="0" /} {field index="1" term="http://rs.tdwg.org/dwc/terms/vernacularName"/} {field index="2" term="http://purl.org/dc/terms/language"/} {/extension} {extension encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="0" rowType="http://rs.gbif.org/terms/1.0/SpeciesProfile"} {files} {location}speciesprofile.txt{/location} {/files} {coreid index="0" /} {field index="1" term="http://rs.gbif.org/terms/1.0/livingPeriod"/} {field index="2" term="http://rs.tdwg.org/dwc/terms/habitat"/} {field index="3" term="http://rs.gbif.org/terms/1.0/isExtinct"/} {field index="4" term="http://rs.gbif.org/terms/1.0/isFreshwater"/} {field index="5" term="http://rs.gbif.org/terms/1.0/isMarine"/} {field index="6" term="http://rs.gbif.org/terms/1.0/isTerrestrial"/} {/extension} {/archive}

jar398 commented 6 years ago

Amazing, thank you! One more reason to implement good DwCA support.

jar398 commented 6 years ago

Of course ‘isExtinct’ changes over time, and the current rate of extinction is quite high, so sources really should be recording the latest time at which something was known to be extant, if known by the source, or the earliest time at which it was known to be extinct, if known by the source.

On Jun 26, 2018, at 12:20 AM, TonyRees notifications@github.com wrote:

third try - angle brackets replaced with curies: {archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml"} {core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="0" rowType="http://rs.tdwg.org/dwc/terms/Taxon"} {files} {location}taxon.txt{/location} {/files} {id index="0" /} {field index="1" term="http://rs.tdwg.org/dwc/terms/acceptedNameUsageID"/} {field index="2" term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/} {field index="3" term="http://rs.tdwg.org/dwc/terms/taxonRank"/} {field index="4" term="http://rs.tdwg.org/dwc/terms/scientificName"/} {field index="5" term="http://rs.tdwg.org/dwc/terms/scientificNameAuthorship"/} {field index="6" term="http://rs.tdwg.org/dwc/terms/nameAccordingTo"/} {/core} {extension encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="0" rowType="http://rs.gbif.org/terms/1.0/VernacularName"} {files} {location}vernacularname.txt{/location} {/files} {coreid index="0" /} {field index="1" term="http://rs.tdwg.org/dwc/terms/vernacularName"/} {field index="2" term="http://purl.org/dc/terms/language"/} {/extension} {extension encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="0" rowType="http://rs.gbif.org/terms/1.0/SpeciesProfile"} {files} {location}speciesprofile.txt{/location} {/files} {coreid index="0" /} {field index="1" term="http://rs.gbif.org/terms/1.0/livingPeriod"/} {field index="2" term="http://rs.tdwg.org/dwc/terms/habitat"/} {field index="3" term="http://rs.gbif.org/terms/1.0/isExtinct"/} {field index="4" term="http://rs.gbif.org/terms/1.0/isFreshwater"/} {field index="5" term="http://rs.gbif.org/terms/1.0/isMarine"/} {field index="6" term="http://rs.gbif.org/terms/1.0/isTerrestrial"/} {/extension} {/archive}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

TonyRees commented 6 years ago

IRMNG also uses "isExtinct" but this is actually translated from the IRMNG concept "fossil only" which is used in the sense of "extinct before A.D. 1500" i.e. not seen alive in "modern" times. Taxa which have gone extinct in modern times are classified as "Recent" in IRMNG which is not an exact fit with "isExtinct" = false, but there you go... in reality another flag or 2 is needed (I think CoL is going down this path but do not have their options to hand).