AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
38 stars 3 forks source link

Problem disambiguating homonyms with `search_taxa` #168

Closed mjwestgate closed 1 year ago

mjwestgate commented 1 year ago

@fontikar recently found a homonym during a call to search_taxa:

> search_taxa("ACANTHOCEPHALA") # Problem, ACANTHOCEPHALA is a Phylum and a class
# A tibble: 1 × 2
  search_term    issues 
  <chr>          <chr>  
1 ACANTHOCEPHALA homonym
Warning message:
Your search returned multiple taxa due to a homonym issue.
ℹ Please provide another rank in your search to clarify taxa.
✖ Homonym issue with "ACANTHOCEPHALA". 

That's fine, but there should be a way to disambiguate this. My proposed solution is to use a data.frame to ensure this gets passed to the searchByClassification API, i.e.:

> search_taxa(data.frame(phylum = "ACANTHOCEPHALA"))
Error:
! Column name `phylum` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `repaired_names()`:
! Names must be unique.
✖ These names are duplicated:
  * "phylum" at locations 1 and 8.
Run `rlang::last_error()` to see where the error occurred.

The error here suggests that the search has worked, but the joining of the input and output tibbles has failed. There is also a broader question of whether this is efficient for specifying multiple ranks for a single search, or whether a filter-like implementation might be more efficient, e.g.

search_taxa(phylum == "ACANTHOCEPHALA")

Finally, the other alternative of specifying an authority for a name (suggested by @daxkellie) also fails:

> search_taxa("ACANTHOCEPHALA Koelreuther, 1771")
# A tibble: 1 × 2
  search_term                      issues 
  <chr>                            <chr>  
1 ACANTHOCEPHALA Koelreuther, 1771 homonym
Warning message:
Your search returned multiple taxa due to a homonym issue.
ℹ Please provide another rank in your search to clarify taxa.
✖ Homonym issue with "ACANTHOCEPHALA Koelreuther, 1771".

Just to prove that this taxon occurs in the atlas, if we look up the identifier we can get the query to run:

> galah_call() %>% 
+     galah_identify("https://biodiversity.org.au/afd/taxa/3cbb537e-ab39-4d85-864e-76cd6b6d6572", search = FALSE) %>% 
+     atlas_counts()
# A tibble: 1 × 1
  count
  <int>
1   395
daxkellie commented 1 year ago

Homonym errors now suggest for you to use a tibble and see the ?search_taxa help file to learn how to clarify taxa.

> search_taxa("ACANTHOCEPHALA")
No taxon matches were found for "ACANTHOCEPHALA" in the selected atlas (Australia).
# A tibble: 1 × 1
  search_term   
  <chr>         
1 ACANTHOCEPHALA
Warning message:
Your search returned multiple taxa due to a homonym issue.
ℹ Please provide another rank in your search to clarify taxa.
ℹ Use a tibble to clarify taxa, see `?search_taxa`.
✖ Homonym issue with "ACANTHOCEPHALA". 

Clarifying taxa in a data.frame or tibble no longer fails, mainly because of an update to {dplyr} which renames duplicate rows that are merged after bind_rows(). Column names have been changed to avoid messages from {dplyr}

> library(tibble)
> search_taxa(tibble(phylum = "ACANTHOCEPHALA"))
# A tibble: 1 × 10
  search_term    scientific_name scientific_name_authorship taxon_concept_id                             rank  match…¹ kingdom phylum verna…² issues
  <chr>          <chr>           <chr>                      <chr>                                        <chr> <chr>   <chr>   <chr>  <chr>   <chr> 
1 ACANTHOCEPHALA ACANTHOCEPHALA  Koelreuther, 1771          https://biodiversity.org.au/afd/taxa/3cbb53… phyl… exactM… Animal… Acant… Thorny… noIss…
# … with abbreviated variable names ¹​match_type, ²​vernacular_name
mjwestgate commented 1 year ago

Closed as complete