OpenTreeOfLife / opentree

Opentree browsing and curation web site. For overarching or cross-repo concerns, please see the 'germinator' repo.
http://tree.opentreeoflife.org/
BSD 2-Clause "Simplified" License
108 stars 26 forks source link

Example studies with fossil taxa not mapping #32

Closed chrsowen closed 10 years ago

chrsowen commented 11 years ago

Hi Jonathan, Two study ids are below for studies that include fossil taxa. Thanks again for your help, Chris

Study IDs: 1429 and 2550

mtholder commented 11 years ago

Hi Chris, Some of the unmapped taxa are extant. Can you list a couple of the fossil taxa for each of the studies? thanks, Mark

chrsowen commented 11 years ago

Hi Mark, I this is all of them from the two studies. Thanks, Chris

Study ID 2550: Agriochoerus Arctocyon Basilosaurus Cainotherium Cebochoerus Desmostylia Diacodexis pakist Diacodexis wasatch Dissacus navajovious Dissacus praenuntius Elomeryx Entelodontidae Georgiacetus Gobiohyus Hapalodectes hetang Hapalodectes leptogna Harpagolestes Heteromeryx Homacodon Hyopsodus Hyracotherium Leptictidae Leptoreodon Meniscotherium Merycoidodon Mesonyx Mixtotherium Mongolian Dissacus Pachyaena gigantea Pachyaena ossifraga Palaeoryctoidea Phenacodus Plesiadapidae Poebrotherium Protoceras Protocetus Remingtonocetus Sinonyx Synoplotherium Vulpavus

Study ID: 1429 Pachyrhachis problematicus

jar398 commented 11 years ago

Spot checks:

Hmm.... genus Hyracotherium is in GBIF (via IRMNG), but not in OTT. Must get filtered out for some reason. We will need to delve into the details of Stephen's script ( taxomachine/data/process_gbif_taxonomy.py ).

3239467 5479 Hyracotherium Owen, 1840 Hyracotherium genus accepted Hyracotherium Proc. geol. Soc. London, 3 (66), 163 (as Hyotherium p. 240). Interim Register of Marine and Nonmarine Genera Animalia Chordata Mammalia Linnaeus, 1758 Perissodactyla Owen, 1848 Equidae Gray, 1821

Homacodon is similar. It wouldn't surprise me if they were all of this pattern, since these two were a random sample.

Jonathan

mtholder commented 11 years ago

It looks like commit 2aee0ad10cc2c969cba3cc330fcc484e5cf2d04c of taxomachine removed the scripts. Checkout commit 010c215e6d88aa79e67291b6e65185fbf98c25dc if you need to see them...

jar398 commented 11 years ago

Ouch! I guess Stephen didn't realize I was counting on the continued presence of that directory...

an opportunity (?) to try out the method given here:

http://st-on-it.blogspot.com/2010/01/how-to-move-folders-between-git.html

That is, I could just copy the files over to opentree/smasher/, but it's better if their histories can be preserved too.

Jonathan

On Wed, Jun 12, 2013 at 11:58 AM, Mark Holder notifications@github.comwrote:

It looks like commit 2aee0adhttps://github.com/OpenTreeOfLife/opentree/commit/2aee0ad10cc2c969cba3cc330fcc484e5cf2d04cof taxomachine removed the scripts. Checkout commit 010c215https://github.com/OpenTreeOfLife/opentree/commit/010c215e6d88aa79e67291b6e65185fbf98c25dcif you need to see them...

— Reply to this email directly or view it on GitHubhttps://github.com/OpenTreeOfLife/opentree/issues/32#issuecomment-19335653 .

TonyRees commented 11 years ago

Hi all,

Tony Rees here - compiler/maintainer of IRMNG - contains mostly genera & families, some species too...

I just passed the list of no-match names through IRMNG and all the first words matched as genera with the exception of:

Desmostylia Entelodontidae Leptictidae Mongolian Dissacus Palaeoryctoidea Plesiadapidae

Desmostylia is an Order, Palaeoryctoidea is a superfamily as suggested by its termination, the others are families (also detectable by their termination) apart from "Mongolian Dissacus" which is clearly an unorthodox name form.

Hyopsodus appears to be a homonym (2 entries in IRMNG with different authorities) but on further investigation these may actually turn out to be the same. Protoceras is almost a real homonym (IRMNG entries in fossil Mammalia and extant Hymenoptera) but the latter is listed as a misspelling so does not really count. There is a third Protoceras record in Cephalopoda (Molluscs) which occurs in only one source and looks dubious.

Anyway so far as I am aware you have a copy of IRMNG (or two over time) which has been utilized for OTOL although as you say the same names have also been supplied to GBIF so you should be picking them up from there I would think. Let me know if this helps at all,

Regards - Tony Rees

chrsowen commented 11 years ago

Hi Everyone, Here is another study where fossils are not mapping. Study ID: 1804 Aistopoda Amphibamus Apateon Captorhinidae Doleserpeton Eryops Lysorophia Microbrachis Nectridea Osteolepiformes Procolophonidae Seymouria Synapsida Tersomius Triadobatrachus

TonyRees commented 11 years ago

Hi all,

I ran these names through IRMNG – you can do the same by pasting the list into the IRMNG online search page at http://www.cmar.csiro.au/datacentre/irmng/ . IRMNG had all of the genera bar one – Terpsomius was stored under a misspelling (Terpsomus), correct version added as of just now. You will also discover that Captorhinidae and Procolophonidae are family names; Aistopoda, Lysorophia, Nectridea and Osteolepiformes are orders; and only Synapsida is currently unresolved there (it is in fact an obsolete subclass name, IRMNG does not store subclasses at present).

All these names with the exception of Terpsomius and Synapsida would have been supplied with previous versions of IRMNG, I think, don’t know where those have ended up…

Cheers - Tony

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: chrsowen [mailto:notifications@github.com] Sent: Wednesday, 19 June 2013 1:15 AM To: OpenTreeOfLife/opentree Cc: Rees, Tony (CMAR, Hobart) Subject: Re: [opentree] Example studies with fossil taxa not mapping (#32)

Hi Everyone, Here is another study where fossils are not mapping. Study ID: 1804 Aistopoda Amphibamus Apateon Captorhinidae Doleserpeton Eryops Lysorophia Microbrachis Nectridea Osteolepiformes Procolophonidae Seymouria Synapsida Tersomius Triadobatrachus

— Reply to this email directly or view it on GitHubhttps://github.com/OpenTreeOfLife/opentree/issues/32#issuecomment-19618069.

jar398 commented 11 years ago

I checked on the status of these in OTT

Aistopoda - not in GBIF Amphibamus - not in GBIF Apateon - suppressed by preprocessing script - don't know why Captorhinidae - suppressed because IRMNG homonym list Doleserpeton - why suppressed? Eryops - suppressed because IRMNG homonym list Lysorophia - not in GBIF Microbrachis - why suppressed? Nectridea - not in GBIF Osteolepiformes - not in GBIF Procolophonidae - IRMNG homonym list Seymouria - suppressed because IPNI Synapsida - not in GBIF Tersomius - why suppressed? (PDB) Triadobatrachus - why suppressed? (IRMNG)

Stephen, you must have had a good reason to exclude IPNI and the IRMNG homonym list. What bad things happen if these are added to the taxonomy?

Jonathan

On Tue, Jun 18, 2013 at 11:15 AM, chrsowen notifications@github.com wrote:

Hi Everyone, Here is another study where fossils are not mapping. Study ID: 1804 Aistopoda Amphibamus Apateon Captorhinidae Doleserpeton Eryops Lysorophia Microbrachis Nectridea Osteolepiformes Procolophonidae Seymouria Synapsida Tersomius Triadobatrachus

— Reply to this email directly or view it on GitHubhttps://github.com/OpenTreeOfLife/opentree/issues/32#issuecomment-19618069 .

jar398 commented 11 years ago

Tony, thanks for your attention.

Here's the deal with IRMNG. OTT 2.0 gets all of its IRMNG content from GBIF. GBIF unfortunately is using a version of IRMNG from 2011 that shows a large number of invalid taxa as being accepted. These were corrected in later versions of IRMNG (including the one on the GNACLR site, from 2012), but we don't have a script yet to use any such later version.

To compensate for this problem Stephen added code to the GBIF preprocessing script to suppress all names attributed to the IRMNG homonym list, and all descendents of such names/taxa. For OTT 2.1 I have changed the script to only suppress such names when the taxon has no children, and that helps in many cases, e.g. Amphibamus and many others in the above lists. (The IPNI issue is a red herring; Seymouria the amphibian appears just fine in OTT 2.1.)

This change for OTT 2.1 is only a halfway measure since it doesn't help us get newer IRMNG taxa such as Osteolepiformes.

If there is a Darwin Core archive for a post-2011 version of IRMNG, then GBIF should be notified so that they can pick it up and incorporate it into their taxonomy. They just released a new taxonomy last week and it still had the 2011 version of IRMNG.

I haven't decided whether to process IRMNG specially or just continue to rely on GBIF. If GBIF gets a newer version then any work we do to special-case IRMNG in our system will be redundant. Obviously that's my preference. If this IRMNG/GBIF coordination won't be happening any time soon, we will want to get a recent version of IRMNG and write a little script to prepare it for entry into our system. Laura may have a recent version, but I don't, other than the the GNACLR one.

TonyRees commented 11 years ago

Thanks for the heads-up, Jonathan. I guess I should therefore supply a new IRMNG dump to GBIF in time for their next NUB rebuild, perhaps within e.g. the next 3 months maybe.

Also I just discovered a bug in my code which was potentially showing some names (maybe 8,000 out of 70,000) as homonyms which should not be so on account of being misspellings (which technically are not homonyms and I therefore exclude when generating the homonyms list), now fixed. So if you like to re-crawl my list you should get something more accurate to use for OTT purposes.

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.aumailto:Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: Jonathan A Rees [mailto:notifications@github.com] Sent: Monday, 8 July 2013 6:22 AM To: OpenTreeOfLife/opentree Cc: Rees, Tony (CMAR, Hobart) Subject: Re: [opentree] Example studies with fossil taxa not mapping (#32)

Tony, thanks for your attention.

Here's the deal with IRMNG. OTT 2.0 gets all of its IRMNG content from GBIF. GBIF unfortunately is using a version of IRMNG from 2011 that shows a large number of invalid taxa as being accepted. These were corrected in later versions of IRMNG (including the one on the GNACLR site, from 2012), but we don't have a script yet to use any such later version.

To compensate for this problem Stephen added code to the GBIF preprocessing script to suppress all names attributed to the IRMNG homonym list, and all descendents of such names/taxa. For OTT 2.1 I have changed the script to only suppress such names when the taxon has no children, and that helps in many cases, e.g. Amphibamus and many others in the above lists. (The IPNI issue is a red herring; Seymouria the amphibian appears just fine in OTT 2.1.)

This change for OTT 2.1 is only a halfway measure since it doesn't help us get newer IRMNG taxa such as Osteolepiformes.

If there is a Darwin Core archive for a post-2011 version of IRMNG, then GBIF should be notified so that they can pick it up and incorporate it into their taxonomy. They just released a new taxonomy last week and it still had the 2011 version of IRMNG.

I haven't decided whether to process IRMNG specially or just continue to rely on GBIF. If GBIF gets a newer version then any work we do to special-case IRMNG in our system will be redundant. Obviously that's my preference. If this IRMNG/GBIF coordination won't be happening any time soon, we will want to get a recent version of IRMNG and write a little script to prepare it for entry into our system. Laura may have a recent version, but I don't, other than the the GNACLR one.

— Reply to this email directly or view it on GitHubhttps://github.com/OpenTreeOfLife/opentree/issues/32#issuecomment-20576822.

TonyRees commented 11 years ago

Hi Tony, Jonathan, if there would be an updated IRMNG homonyms file that'll be greatly appreciated. Tony, just let us know once its available and we will update our backbone shortly after! Thanks, Markus

PS: meanwhile I will try to apply the same fix for genera from IRMNG without child species - thanks for the input!

On 08.07.2013, at 06:58, Tony.Rees@csiro.au wrote:

Thanks for the heads-up, Jonathan. I guess I should therefore supply a new IRMNG dump to GBIF in time for their next NUB rebuild, perhaps within e.g. the next 3 months maybe.

Also I just discovered a bug in my code which was potentially showing some names (maybe 8,000 out of 70,000) as homonyms which should not be so on account of being misspellings (which technically are not homonyms and I therefore exclude when generating the homonyms list), now fixed. So if you like to re-crawl my list you should get something more accurate to use for OTT purposes.

Tony Rees Manager, Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania 7001, Australia Ph: 0362 325318 (Int: +61 362 325318) Fax: 0362 325000 (Int: +61 362 325000) e-mail: Tony.Rees@csiro.au Manager, OBIS Australia regional node, http://www.obis.org.au/ Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566 LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36

From: Jonathan A Rees [mailto:notifications@github.com] Sent: Monday, 8 July 2013 6:22 AM To: OpenTreeOfLife/opentree Cc: Rees, Tony (CMAR, Hobart) Subject: Re: [opentree] Example studies with fossil taxa not mapping (#32)

Tony, thanks for your attention.

Here's the deal with IRMNG. OTT 2.0 gets all of its IRMNG content from GBIF. GBIF unfortunately is using a version of IRMNG from 2011 that shows a large number of invalid taxa as being accepted. These were corrected in later versions of IRMNG (including the one on the GNACLR site, from 2012), but we don't have a script yet to use any such later version.

To compensate for this problem Stephen added code to the GBIF preprocessing script to suppress all names attributed to the IRMNG homonym list, and all descendents of such names/taxa. For OTT 2.1 I have changed the script to only suppress such names when the taxon has no children, and that helps in many cases, e.g. Amphibamus and many others in the above lists. (The IPNI issue is a red herring; Seymouria the amphibian appears just fine in OTT 2.1.)

This change for OTT 2.1 is only a halfway measure since it doesn't help us get newer IRMNG taxa such as Osteolepiformes.

If there is a Darwin Core archive for a post-2011 version of IRMNG, then GBIF should be notified so that they can pick it up and incorporate it into their taxonomy. They just released a new taxonomy last week and it still had the 2011 version of IRMNG.

I haven't decided whether to process IRMNG specially or just continue to rely on GBIF. If GBIF gets a newer version then any work we do to special-case IRMNG in our system will be redundant. Obviously that's my preference. If this IRMNG/GBIF coordination won't be happening any time soon, we will want to get a recent version of IRMNG and write a little script to prepare it for entry into our system. Laura may have a recent version, but I don't, other than the the GNACLR one.

— Reply to this email directly or view it on GitHub.

jar398 commented 11 years ago

OTT 2.1 seems to fix most of these problems, although I didn't check every name. It excludes Apateon, which is both a synonym. I'm still working out the kinks but I think it's better than it was. Jonathan

jar398 commented 10 years ago

I checked the status of these in OTT 2.5 draft 1, which of course now has IRMNG genera. Most of them are there. In the cases they're not, it's because they're not known in any of our feeder taxonomies. We will have to add new taxonomy sources, or add to the taxonomy as patches.

Of course the decision was made to hide extinct taxa for the time being. We can re-expose them at any time, at the cost of clutter in the synthetic tree due to the large number if incertae sedis fossils, but a good strategy for dealing with extinct taxa has not been worked out yet. I suggest keeping in touch with Stephen on this issue, since it's up to him what goes into synthesis.

OK if I close this issue?

(4944667=gbif:4835359 Agriochoerus) (4943853=gbif:3239419 Arctocyon) (4942612=gbif:3240256 Basilosaurus) (4944725=gbif:4835695 Cainotherium) (4944704=gbif:4835498 Cebochoerus) (5327129=irmng:11066 Desmostylia) * No taxon found with this name: Diacodexis pakist -- misspelling for Diacodexis pakistanensis ? * No taxon found with this name: Diacodexis wasatch -- not in any input taxonomy * No taxon found with this name: Dissacus navajovious -- not in any input taxonomy (4942385=gbif:4974058. Dissacus praenuntius) (4944775=gbif:4835391 Elomeryx) (4944916=gbif:3240378 Entelodontidae) (4942553=gbif:4832598 Georgiacetus) (4944543=gbif:4576254 Gobiohyus) * No taxon found with this name: Hapalodectes hetang -- misspelling for Hapalodectes hetangensis ? * No taxon found with this name: Hapalodectes leptogna -- misspelling for Hapalodectes leptognathus ? (4942373=gbif:4832492 Harpagolestes) (4944618=gbif:4835648 Heteromeryx) (4944869=gbif:4834785 Homacodon) (4945069=gbif:5428926 Hyopsodus) (4942678=gbif:3239467 Hyracotherium) (4943486=gbif:3239371 Leptictidae) (4944634=gbif:4835656 Leptoreodon) (4945579=gbif:4829880 Meniscotherium) (4944458=gbif:4835269 Merycoidodon) (4942379=gbif:4832495 Mesonyx) (4944612=gbif:4835644. Mixtotherium) * No taxon found with this name: Mongolian Dissacus -- not in any input taxonomy (4942363=gbif:4974056. Pachyaena gigantea) (4942364=gbif:4974057. Pachyaena ossifraga) * No taxon found with this name: Palaeoryctoidea -- not in any input taxonomy (4945569=gbif:3239508 Phenacodus) (4941495=gbif:3239538 Plesiadapidae) (4942203=gbif:4835769 Poebrotherium) * Ambiguous taxon name: Protoceras (4942551=gbif:4832593 Protocetus) (4942504=gbif:4832544 Remingtonocetus) (4942371=gbif:4832493 Sinonyx) (4942399=gbif:4832482 Synoplotherium) (4942025=gbif:4833267 Vulpavus)

(5326025=irmng:12418 Aistopoda) (4948603=gbif:4815794. Amphibamus) * Ambiguous taxon name: Apateon (4948118=gbif:3239057 Captorhinidae) (4948597=gbif:4815799 Doleserpeton) * Ambiguous taxon name: Eryops (5326018=irmng:11819 Lysorophia) (4948567=gbif:4816269. Microbrachis) (5326003=irmng:10147 Nectridea) (5320230=irmng:12292 Osteolepiformes) (4946157=gbif:3238968 Procolophonidae) (4948517=gbif:3241196 Seymouria) \ No taxon found with this name: Synapsida -- not in any input taxonomy (4948606=gbif:4975520 Tersomius) (4948636=gbif:3241217. Triadobatrachus)

jar398 commented 10 years ago

I asked if I could close the issue and got no reply, so will close. Reopen if neccessary.