OpenTreeOfLife / opentree

Opentree browsing and curation web site. For overarching or cross-repo concerns, please see the 'germinator' repo.
http://tree.opentreeoflife.org/
BSD 2-Clause "Simplified" License
111 stars 26 forks source link

Homonym descriptions far too vague in curator #1019

Closed josephwb closed 8 years ago

josephwb commented 8 years ago

In trying to map names in a bird tree I get a number of homonym matches:

screenshot from 2016-08-12 11 48 04

However, I set the search context to Birds. Telling me that a taxon belongs to an even wider context is not useful. The curator should be smart to realize that, for example, Hirundo is unambiguous in Aves; why isn't this an exact match?

josephwb commented 8 years ago

This is the study involved.

jar398 commented 8 years ago

My guess would be that this is a TNRS issue, assuming the homonyms (or hemihomonyms, as the case may be) are legit and it's not a taxonomy problem.

josephwb commented 8 years ago

I've accepted all of those "homonyms" in the example study above as I need to use the tree. But behaviour can obviously still be explored by unmapping taxa.

josephwb commented 8 years ago

I would like to stress this as an important improvement. Having the ability to click on the [syn|hom]onym and connect to the taxonomy browser is undoubtably a vast improvement (and I hope stops people from blindly accepting such matches), but as a curator it is still a :open_hands: (i.e. huge) hassle to click on every single one to make sure your tree is completely mapped. I imagine the required change is quite deep, though.

jimallman commented 8 years ago

I've assumed this is a TNRS issue, but it's possible the web-app UI is mistakenly showing these as homonyms. I'll investigate further, but FYI here's the TNRS query for 'Hirundo' (using the current settings in this study) as a cURL call:

$ curl 'https://api.opentreeoflife.org/v3/tnrs/match_names' \
  -H 'Content-Type: application/json; charset=UTF-8' \
  --data-binary '{"names":["Hirundo"],"include_suppressed":false,"do_approximate_matching":false,"context_name":"Birds"}'

{
  "governing_code" : "ICZN",
  "unambiguous_names" : [ "Hirundo" ],
  "unmatched_names" : [ ],
  "matched_names" : [ "Hirundo" ],
  "context" : "Birds",
  "includes_deprecated_taxa" : false,
  "includes_suppressed_names" : false,
  "includes_approximate_matches" : false,
  "taxonomy" : {
    "weburl" : "https://tree.opentreeoflife.org/about/taxonomy-version/ott2.9",
    "author" : "open tree of life project",
    "name" : "ott",
    "source" : "ott2.9draft12",
    "version" : "2.9"
  },
  "results" : [ {
    "name" : "Hirundo",
    "matches" : [ {
      "is_synonym" : false,
      "score" : 1.0,
      "nomenclature_code" : "ICZN",
      "is_approximate_match" : false,
      "taxon" : {
        "is_suppressed" : false,
        "tax_sources" : [ "ncbi:43149", "worms:205063", "gbif:2489222", "irmng:1464278", "irmng:1028497" ],
        "unique_name" : "Hirundo (genus in Deuterostomia)",
        "synonyms" : [ "Ptionoprogne", "Hirundo", "Ptyoprocne", "Herophilus", "Cercropis" ],
        "name" : "Hirundo",
        "flags" : [ ],
        "ott_id" : 897677,
        "rank" : "genus"
      },
      "search_string" : "hirundo",
      "matched_name" : "Hirundo"
    } ]
  } ]
}
jimallman commented 8 years ago

Ah, it seems the problem is that its unique name doesn't match the matched name. I need a smarter check for homonyms, but I'm not sure whether/how this is indicated in the TNRS response. Suggestions welcome!

jar398 commented 8 years ago

Please disregard this comment.

The current TNRS behavior is certainly wrong, and I acknowledge the inconvenience to the curator.

But as a matter of triage it seems unlikely that we'll get around to fixing the TNRS.

I think fixing this bug fix would benefit only a small number of future studies, since most ambiguities are genus-level while most studies are species-level. (I'm talking off the top of my head; this is something we could measure.) Fixing it will be both difficult and painful - it will involve learning how taxomachine & neo4j indexes work. Prospects are therefore gloomy (high cost / small benefit). (I suggest it will fall on me because since Cody left I've been the taxomachine maintainer, but if someone else wants to take a look, that would be great.)

josephwb commented 8 years ago

From what Jim states it seems like the TNRS is working fine, and that things could be addressed in the curator. For instance, if:

  1. Only 1 name is returned
  2. Name matches "unique_name - (whatever)" (i.e. using regex)

Then is would be an exact match. Or is this too simplistic?

I admit that things would get much more difficult if more than 1 name was returned.

jimallman commented 8 years ago

Thanks @josephwb! Even a piecemeal solution would cut down on false positives and the resulting inconvenience for the curator.

jimallman commented 8 years ago

I've gone ahead with a simple revision: If there's just one suggestion from TNRS, we won't treat / present it as a homonym. We'll skip the comparison of name versus unique_name and instead rely on the match score (%) to determine whether it's an exact match; if so, auto-approval will work as expected.

This is available for review on devtree. As a test, try mapping 'Sylvia' in the 'All life' context. The UI should offer three suggestions, one synonym and two homonyms, none of which will be auto-approved. Search again in the 'Birds' context and it will show a single match with 100% score and easy auto-approval.

josephwb commented 8 years ago

Yeah, I guess the regex is unnecessary. I tried this out and it seems like it works well. This study, being genus-level (birds), is particularly illustrative of how better the new method works. Under "All life", there are many homonyms, but changing to "Birds" almost all change to exact matches.

However, even though those matches change to green (exact), they are not added when checking "Accept exact matches". Maybe this is because of the skipped name check, where "exact" is turned on? Should be simple. CC @jimallman

jimallman commented 8 years ago

However, even though those matches change to green (exact), they are not added when checking "Accept exact matches".

@josephwb, I thought I was using the same logic for display and auto-acceptance. Are some of these mappings accepted, but not all? Any chance some of them are less than 100% score (shows in mouse-over hint)?

josephwb commented 8 years ago

I think none of them (the "new" matches, that is) are being accepted.

jimallman commented 8 years ago

Odd, this works for me, testing 'Sylvia' as an edited label in this study (not that it should matter). What study and OTUs are you testing with? What search context?

josephwb commented 8 years ago

This study. Set context to Aves, map, and then try to accept all exact matches. I just did, and it did not accept any: there are 142 mapped beforehand, and still 142 mapped when trying to accept all. In other words, they do not change from green (exact match) to blue (mapped).

jimallman commented 8 years ago

Indeed, there was a bug in my code. @josephwb Please try again, this is working for me (using your example) on devtree.

josephwb commented 8 years ago

Yup, that example works as expected now. I'm not sure how else to test it; this study exemplifies the problem. :dart: