Closed josephwb closed 8 years ago
My guess would be that this is a TNRS issue, assuming the homonyms (or hemihomonyms, as the case may be) are legit and it's not a taxonomy problem.
I've accepted all of those "homonyms" in the example study above as I need to use the tree. But behaviour can obviously still be explored by unmapping taxa.
I would like to stress this as an important improvement. Having the ability to click on the [syn|hom]onym and connect to the taxonomy browser is undoubtably a vast improvement (and I hope stops people from blindly accepting such matches), but as a curator it is still a :open_hands: (i.e. huge) hassle to click on every single one to make sure your tree is completely mapped. I imagine the required change is quite deep, though.
I've assumed this is a TNRS issue, but it's possible the web-app UI is mistakenly showing these as homonyms. I'll investigate further, but FYI here's the TNRS query for 'Hirundo' (using the current settings in this study) as a cURL call:
$ curl 'https://api.opentreeoflife.org/v3/tnrs/match_names' \
-H 'Content-Type: application/json; charset=UTF-8' \
--data-binary '{"names":["Hirundo"],"include_suppressed":false,"do_approximate_matching":false,"context_name":"Birds"}'
{
"governing_code" : "ICZN",
"unambiguous_names" : [ "Hirundo" ],
"unmatched_names" : [ ],
"matched_names" : [ "Hirundo" ],
"context" : "Birds",
"includes_deprecated_taxa" : false,
"includes_suppressed_names" : false,
"includes_approximate_matches" : false,
"taxonomy" : {
"weburl" : "https://tree.opentreeoflife.org/about/taxonomy-version/ott2.9",
"author" : "open tree of life project",
"name" : "ott",
"source" : "ott2.9draft12",
"version" : "2.9"
},
"results" : [ {
"name" : "Hirundo",
"matches" : [ {
"is_synonym" : false,
"score" : 1.0,
"nomenclature_code" : "ICZN",
"is_approximate_match" : false,
"taxon" : {
"is_suppressed" : false,
"tax_sources" : [ "ncbi:43149", "worms:205063", "gbif:2489222", "irmng:1464278", "irmng:1028497" ],
"unique_name" : "Hirundo (genus in Deuterostomia)",
"synonyms" : [ "Ptionoprogne", "Hirundo", "Ptyoprocne", "Herophilus", "Cercropis" ],
"name" : "Hirundo",
"flags" : [ ],
"ott_id" : 897677,
"rank" : "genus"
},
"search_string" : "hirundo",
"matched_name" : "Hirundo"
} ]
} ]
}
Ah, it seems the problem is that its unique name doesn't match the matched name. I need a smarter check for homonyms, but I'm not sure whether/how this is indicated in the TNRS response. Suggestions welcome!
Please disregard this comment.
The current TNRS behavior is certainly wrong, and I acknowledge the inconvenience to the curator.
But as a matter of triage it seems unlikely that we'll get around to fixing the TNRS.
I think fixing this bug fix would benefit only a small number of future studies, since most ambiguities are genus-level while most studies are species-level. (I'm talking off the top of my head; this is something we could measure.) Fixing it will be both difficult and painful - it will involve learning how taxomachine & neo4j indexes work. Prospects are therefore gloomy (high cost / small benefit). (I suggest it will fall on me because since Cody left I've been the taxomachine maintainer, but if someone else wants to take a look, that would be great.)
From what Jim states it seems like the TNRS is working fine, and that things could be addressed in the curator. For instance, if:
Then is would be an exact match. Or is this too simplistic?
I admit that things would get much more difficult if more than 1 name was returned.
Thanks @josephwb! Even a piecemeal solution would cut down on false positives and the resulting inconvenience for the curator.
I've gone ahead with a simple revision: If there's just one suggestion from TNRS, we won't treat / present it as a homonym. We'll skip the comparison of name
versus unique_name
and instead rely on the match score (%) to determine whether it's an exact match; if so, auto-approval will work as expected.
This is available for review on devtree. As a test, try mapping 'Sylvia' in the 'All life' context. The UI should offer three suggestions, one synonym and two homonyms, none of which will be auto-approved. Search again in the 'Birds' context and it will show a single match with 100% score and easy auto-approval.
Yeah, I guess the regex is unnecessary. I tried this out and it seems like it works well. This study, being genus-level (birds), is particularly illustrative of how better the new method works. Under "All life", there are many homonyms, but changing to "Birds" almost all change to exact matches.
However, even though those matches change to green (exact), they are not added when checking "Accept exact matches". Maybe this is because of the skipped name check, where "exact" is turned on? Should be simple. CC @jimallman
However, even though those matches change to green (exact), they are not added when checking "Accept exact matches".
@josephwb, I thought I was using the same logic for display and auto-acceptance. Are some of these mappings accepted, but not all? Any chance some of them are less than 100% score (shows in mouse-over hint)?
I think none of them (the "new" matches, that is) are being accepted.
Odd, this works for me, testing 'Sylvia' as an edited label in this study (not that it should matter). What study and OTUs are you testing with? What search context?
This study. Set context to Aves, map, and then try to accept all exact matches. I just did, and it did not accept any: there are 142 mapped beforehand, and still 142 mapped when trying to accept all. In other words, they do not change from green (exact match) to blue (mapped).
Indeed, there was a bug in my code. @josephwb Please try again, this is working for me (using your example) on devtree.
Yup, that example works as expected now. I'm not sure how else to test it; this study exemplifies the problem. :dart:
In trying to map names in a bird tree I get a number of homonym matches:
However, I set the search context to Birds. Telling me that a taxon belongs to an even wider context is not useful. The curator should be smart to realize that, for example, Hirundo is unambiguous in Aves; why isn't this an exact match?