OpenTreeOfLife / opentree

Opentree browsing and curation web site. For overarching or cross-repo concerns, please see the 'germinator' repo.
http://tree.opentreeoflife.org/
BSD 2-Clause "Simplified" License
111 stars 26 forks source link

match_names matches "Genus species" and "Not Genus species" (flagged anomalies) #1009

Closed arlin closed 8 years ago

arlin commented 8 years ago

See chat room discussion from 4 Aug 2016 (13:38). There are some anomalies labeled with "Not" as in "Not Solanum lycopersicum". @jar398 explains

I had a heuristic in the matching code that prepended the “Not”, I think. I’ve replaced that with a manually curated list and the names are now set to “cluster XX12345” or whatever.

The current behavior of tnrs/match_names is that a search for "Solanum lycopersicum" hits both "Solanum lycopersicum" and "Not Solanum lycopersicum". See

curl -X POST https://api.opentreeoflife.org/v2/tnrs/match_names -H "content-type:application/json" -d '{"names":["Solanum lycopersicum"]}'

This returns "Not Solanum lycopersicum" and says is_synonym = TRUE and is_approximate_match = FALSE, which isn't what you want. See:

   "results" : [ {
    "id" : "Solanum lycopersicum",
    "matches" : [ {
      "matched_node_id" : 3437601,
      "synonyms" : [ "Solanum esculentum", "Lycopersicon esculentum var. esculentum", "Lycopersicon cerasiforme", "Lycopersicon pyriforme", "Solanum lycopersicum", "Solanum lycopersicon", "Lycopersicon esculentum", "Lycopersicon lycopersicum", "Lycopersicum esculentum" ],
      "flags" : [ ],
      "ot:ottTaxonName" : "Solanum lycopersicum",
      "search_string" : "solanum lycopersicum",
      "matched_name" : "Solanum lycopersicum",
      "is_synonym" : false,
      "score" : 1.0,
      "tax_sources" : [ "ncbi:4081", "gbif:2930137", "irmng:11222025" ],
      "unique_name" : "Solanum lycopersicum (species in domain Eukaryota)",
      "ot:ottId" : 378964,
      "is_deprecated" : false,
      "nomenclature_code" : "ICN",
      "is_approximate_match" : false,
      "rank" : "species",
      "is_dubious" : false
    }, {
      "matched_node_id" : 4205079,
      "synonyms" : [ "Not Solanum lycopersicum", "Solanum lycopersicum" ],
      "flags" : [ ],
      "ot:ottTaxonName" : "Not Solanum lycopersicum",
      "search_string" : "solanum lycopersicum",
      "matched_name" : "Solanum lycopersicum",
      "is_synonym" : true,
      "score" : 1.0,
      "tax_sources" : [ "ncbi:4081", "silva:BABP01087923" ],
      "unique_name" : "Not Solanum lycopersicum",
      "ot:ottId" : 5254103,
      "is_deprecated" : false,
      "nomenclature_code" : "ICNP",
      "is_approximate_match" : false,
      "rank" : "no rank - terminal",
      "is_dubious" : false
    } ]
  } ]
jimallman commented 8 years ago

@jar398 says there are no more "Not XXX" names in taxonomy, nor is he expecting more. OK to close this?

arlin commented 8 years ago

Thanks. By chance, I ran the "Not solanum" example Saturday at the Phylotastic team meeting and the bug had disappeared. And the curl call above is now fixed.