globalbioticinteractions / globalbioticinteractions

Global Biotic Interactions provides access to existing species interaction datasets
https://globalbioticinteractions.org
GNU General Public License v3.0
122 stars 17 forks source link

suspicious interaction: parasitic whale Balaenopteridae in "millerse-US-National-Parasite-Collection" #307

Closed majuss closed 6 years ago

majuss commented 7 years ago

A lot of entries in GLoBI from this source are false. See the example image below. The whales are not even mentioned in the original source. So I think some fuzzy text search got it wrong. The source also includes lots of other dodgy records. Another example suggests that all plants are parasites from another genus of plants.

pasted image at 2017_07_01 03_59 pm

jhammock commented 7 years ago

That's odd. I can find this record in the GloBI browser, http://www.globalbioticinteractions.org/?interactionType=parasiteOf&sourceTaxon=BALAENOPTERidae

but not in the published file at https://zenodo.org/record/800550. Could the errors be from an earlier draft of the resource, not successfully removed from the database?

jhpoelen commented 7 years ago

@majuss Thanks for providing the specific example. It does seems a bit odd to characterize whales as parasites... This specific example help to easily figure out what is going on. If you have more of these specific examples that you haven't shared before, please do share.

Two observations -

  1. GloBI is looking at an older version of the dataset - @millerse made some improvements and has not yet released them (see https://github.com/millerse/US-National-Parasite-Collection/issues/3#issuecomment-322814413). In the current May 30 release, GloBI is instructed to grab the data from http://invertebrates.si.edu/pdfs/NationalParasiteCollection_29-May-2014.txt .
  2. At first glance, the root cause of this funny record is that GloBI mapped "BENEDENIA" to "Balaenopteridae" with http://eol.org/7660 . Looking into the reason why this mapping occurred. More on this later...
jhpoelen commented 7 years ago

The source record for this particular interaction seems to be related to specimen with guid E387E263-1EE3-426B-8692-A0FB17D7BA4E on line 94848 of http://invertebrates.si.edu/pdfs/NationalParasiteCollection_29-May-2014.txt .

,,,"{E387E263-1EE3-426B-8692-A0FB17D7BA4E}",,"098139.00",,"BENEDENIA","LALANDI ?","SERIOLA ?","MONOGENEA",,,"VanCleave-3008",        ,,,,"NORTH AMERICA","USA,IL,Chicago,Shedd Aquarium",,,,,"N",,,,"W","VOUCHERS","SH221:19-88/96","PARIZEK, M","SEP 1936","VAN C        LEAVE, H J",,0,0,,5/23/2006 0:00:00,,,"12 slides, vouchers.  VC-3008.5, .6, .8, .12, .13, .15-.19, .23, .24, yellowtail.  Found in Van Cleave Collection.",
jhpoelen commented 7 years ago

After close inspection, it appears that, the http://eol.org:80/api/search/1.0.xml?q=BENEDENIA&exact=true returns (see xml below), in addition to the expected name of parasite (ie http://eol.org/70755), a match to Balaenopteridae (or http://eol.org/7660) probably because some person Gray used Benedenia to classify species now know as Balaenopteridae. For historic reasons, GloBI maps to the lowest eol taxon/page id when multiple exact matches are present.

@jhammock - from intuition, I would not expect exact matches to invalid/outdated taxonomic names. Can you help me understand whether the observed search results are expected?

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:os="http://a9.com/-/spec/opensearch/1.1/">
  <title>Encyclopedia of Life search: </title>
  <link href="http://eol.org/api/search/1.0/BENEDENIA"/>
  <updated/>
  <author>
    <name>Encyclopedia of Life</name>
  </author>
  <id>http://eol.org/api/search/1.0/BENEDENIA</id>
  <os:totalResults>2</os:totalResults>
  <os:startIndex>1</os:startIndex>
  <os:itemsPerPage>30</os:itemsPerPage>
  <os:Query role="request" searchTerms="" startPage=""/>
  <link rel="alternate" href="http://eol.org/api/search/1.0/BENEDENIA/" type="application/atom+xml"/>
  <link rel="first" href="http://eol.org/api/search/BENEDENIA.xml?page=1" type="application/atom+xml"/>
  <link rel="self" href="http://eol.org/api/search/BENEDENIA.xml?page=1" type="application/atom+xml"/>
  <link rel="last" href="http://eol.org/api/search/BENEDENIA.xml?page=1" type="application/atom+xml"/>
  <link rel="search" href="http://eol.orghttp://media.eol.org//opensearchdescription.xml" type="application/opensearchdescription+xml"/>
  <entry>
    <title>Benedenia</title>
    <link href="http://eol.org/70755?action=overview&amp;controller=taxa"/>
    <id>70755</id>
    <updated/>
    <content>Benedenia; Benedenia Diesing 1858</content>
  </entry>
  <entry>
    <title>Balaenopteridae</title>
    <link href="http://eol.org/7660?action=overview&amp;controller=taxa"/>
    <id>7660</id>
    <updated/>
    <content>Kyphobalaena Eschricht; Kyphobalaena; Poescopia (Gray 1864); Poescopia; Perqualus Gray, 1846; Perqualus; Cyphobalaena Marschall, 1873; Cyphobalaena; Sibbaldus Flower 1865; Sibbaldus; Benedenia Gray 1864; Benedenia; Mysticetus Wagler 1830; Mysticetus; Physalis Fleming 1822; Physalis; Cuvierius Gray 1866; Cuvierius; Rudolphius (Gray 1866); Rudolphius; Swinhoia Gray 1868; Swinhoia; Stenobalaena Gray 1874; Stenobalaena</content>
  </entry>
</feed>
jhammock commented 7 years ago

I'm afraid this is a known bug. This is a record provided by a non-first-tier classification partner. We have found that when these providers were "removed" from our taxonomic navigation system, their content was concealed in the names tab, but their data is still available to the API and the site search (http://eol.org/search?q=benedenia+Gray&search=Go)

My apologies. The classification providers will be triaged afresh for EOL v3 this Fall. This kind of error should be much rarer at that point.

jhpoelen commented 7 years ago

@jhammock Thanks for info - no need to apologize.

Is there a EOL bug id related to this? If so, I would find it helpful to watch that specific bug.

I am hoping to come up with a workaround to resolve this issue. Please let me know if you can think of one. Will contact you off thread . . .

majuss commented 7 years ago

@jhpoelen I have another interesting record in GLoBI for you: bildschirmfoto 2017-08-17 um 14 13 58

This also needs further investigation.

jhpoelen commented 7 years ago

@majuss thanks for sharing another suspicious interaction. Am opening separate thread/issue with specific description to distinguish from this suspicious parasitic whale issue. Two kind suggestions - please create separate issues to different interaction and please submit json as text, so info is easy for copy/paste. If you have another suggestion on how to flag and comment on suspicious interactions, please let me know: I am sure there's smarter ways of sharing feedback.

jhammock commented 7 years ago

You won't find API bugs well tracked anywhere right now. The APIs are being rebuilt. For covering GloBI taxa, I wonder if relying on a single names provider might be safer?

jhpoelen commented 6 years ago

The suspicious whale interaction no longer occurs in GloBI. Now, the previously offending record is available through https://www.globalbioticinteractions.org/?interactionType=hasParasite&sourceTaxon=Seriola%20lalandi&targetTaxon=Benedenia , where Benedenia links to a genus in a flatworm (Platyhelminthes) phylum https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=54640

screenshot from 2017-11-14 09-30-44

@majuss @derele Please chime in if this issue has not yet been addressed. Also, if you see additional data errors, please record them as new issues.