Closed LunaSare closed 6 months ago
Thanks for reporting this. If Cannis is the second name, it works.
We have a shortcut to avoid expensive fuzzy searches if there is an exact match. It looks like we have the logic wrong when there are multiple names, because we should be passing all the non-exact names through to the fuzzy match when some match exactly (instead of just returning the exact match). I'm busy this week, perhaps @bredelings will have time to take a look.
[edited to correct grammar]
I wonder if this was original behavior that was copied from the java implementation. Regardless, it does seem like it should be fixed / changed.
Hmm.... what is happening here is that we are inferring the context "Mammals" if we have "canis" in the search list, but otherwise we infer "LIFE". We then fail to find "amphibians" inside "Mammals".
@LunaSare @mtholder I think this should work if you specify that the context is Life.
Do you think we should change the default behavior, so that we always assume that the context is Life, when the context is unspecified?
We could, for example, only guess the context from the exact matches when there are at least three of them. But I think that magically changing the behavior when the number of exact matches changes could be very confusing for the users.
Using "Life" across all terms seems the more expected behavior.
I agree! using “Life” as default when no context is specified is my expectation. Having different rules to guess a context might get too complicated in my opinion and would require some thorough additional documentation for users.
I just checked the docs ( https://github.com/OpenTreeOfLife/germinator/wiki/TNRS-API-v3#match_names ) and the service is performing the documented behavior. Though the fact that Ben and I both thought this was a bug is a good indication that we agree with you that this is a very confusing default behavior.
A rotl workaround would be for rotl to add the context "Life" if the use does not specify a context. That would turn off the confusing "inferred contexts" part of this service. I don't mind changing the default behavior in v4, but I'm loathe to change the behavior on v3 given that it is documented and some folks may be relying on that behavior. In this case, I'm having a hard time imagining anyone relying on this weirdness. So, I'm persuadable.
just to be clear: I don't mind changing the default behavior in v4 to using "Life" as the default context.
Ah! I see what you mean @mtholder! I think then that just specifying that the context is "All life" for multiple taxon searches, when no other context is known should give the desired behavior for v3, for now:
my_taxa <- c("amphibians", "canis", "felis", "delphinidae", "avess")
rotl::tnrs_match_names(names = my_taxa, context_name = "All life")
> resolved_names
search_string unique_name approximate_match ott_id is_synonym
1 amphibians Amphibia TRUE 544595 FALSE
2 canis Canis FALSE 372706 FALSE
3 felis Felis FALSE 563165 FALSE
4 delphinidae Delphinidae FALSE 698406 FALSE
5 avess Aves TRUE 81461 FALSE
flags number_matches
1 6
2 2
3 1
4 1
5 1
Good point @mtholder! In the GitHub version of rotl
, I changed the default value of the context for the tnrs_match_names
function to use "All life". @LunaSare if you have a chance to test it, let me know how this works for you.
Sure @fmichonneau! I tried it after installing rotl
from github and it works good!
> devtools::install_github("ropensci/rotl")
> my_taxa <- c("amphibians", "canis", "felis", "delphinidae", "avess")
> resolved_names <- rotl::tnrs_match_names(names = my_taxa
+ )
> resolved_names
search_string unique_name approximate_match ott_id is_synonym flags
1 amphibians Amphibia TRUE 544595 FALSE
2 canis Canis FALSE 372706 FALSE
3 felis Felis FALSE 563165 FALSE
4 delphinidae Delphinidae FALSE 698406 FALSE
5 avess Aves TRUE 81461 FALSE
number_matches
1 6
2 2
3 1
4 1
5 1
>
FYI, this fix is now available in rotl 3.0.12 which was just published on CRAN.
While working with the R wrapper for OpenTree's APIs package
rotl
, I was getting different results when running thetnrs_match_names
with a single taxon name and a multiple taxon names. When running it with a single taxa, it works as expectedBut when running it with a character vector of multiple taxa, it seems to ignore approximate matching:
I reported the issue in the R package GitHub repo https://github.com/ropensci/rotl/issues/134, and they looked into it and realized it comes from the OpenTree API. The following was originally posted by @fmichonneau in https://github.com/ropensci/rotl/issues/134#issuecomment-918948953:
Output
```json { "context": "Mammals", "governing_code": "ICZN", "includes_approximate_matches": true, "includes_deprecated_taxa": false, "includes_suppressed_names": false, "matched_names": [ "Canis" ], "results": [ { "matches": [], "name": "Amphibians" }, { "matches": [ { "is_approximate_match": false, "is_synonym": false, "matched_name": "Canis", "nomenclature_code": "ICZN", "score": 1.0, "search_string": "canis", "taxon": { "flags": [], "is_suppressed": false, "is_suppressed_from_synth": false, "name": "Canis", "ott_id": 372706, "rank": "genus", "source": "ott3.3draft1", "synonyms": [ "Aenocyon", "Alopedon", "Alopsis", "Canix", "Chaon", "Dasycyon", "Dieba", "Dimenia", "Jacalius", "Lupulella", "Lupulus", "Lupus", "Lyciscus", "Mamcanisus", "Neocyon", "Oreocyon", "Oxygonus", "Oxygous", "Sacalius", "Schaeffia", "Simenia", "Thos", "Vulpicanis" ], "tax_sources": [ "ncbi:9611", "gbif:5219142", "irmng:1282727" ], "unique_name": "Canis" } }, { "is_approximate_match": false, "is_synonym": true, "matched_name": "Canis", "nomenclature_code": "ICZN", "score": 1.0, "search_string": "canis", "taxon": { "flags": [], "is_suppressed": false, "is_suppressed_from_synth": false, "name": "Atelocynus", "ott_id": 621180, "rank": "genus", "source": "ott3.3draft1", "synonyms": [ "Canis" ], "tax_sources": [ "ncbi:68721", "gbif:2434453", "irmng:1025378" ], "unique_name": "Atelocynus" } } ], "name": "Canis" } ], "taxonomy": { "author": "open tree of life project", "name": "ott", "source": "ott3.3draft1", "version": "3.3", "weburl": "https://tree.opentreeoflife.org/about/taxonomy-version/ott3.3" }, "unambiguous_names": [ "Canis" ], "unmatched_names": [ "Amphibians" ] } ```