AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
40 stars 2 forks source link

search_taxa not handling cases where a taxa is flagged as having a homonym issue #200

Open wcornwell opened 1 year ago

wcornwell commented 1 year ago

Describe the bug The tibble input is not being parsed properly by search_taxa to return the correct taxa_id in the case where there is a Homonym issue with one of the taxa. The help file suggests the tibble input is the right approach for this case but it's not working for me.

galah version 1.5.2

To Reproduce

search_taxa(tibble(genus="Acanthocladium", class="Equisetopsida"))

Expected behaviour It should return the taxa_id for "Acanthocladium" which is the current name for a small daisy genus. The homonym issue is with a moss genus that was formerly (no longer) also called "Acanthocladium".

I expected including tibble(genus="Acanthocladium", class="Equisetopsida") would resolve the homonym issue and the correct taxa_id would be returned.

Instead of the daisy genus, search_taxa returns the taxa_id for Equisetopsida which leads to a large query that then crashes the API.

Screenshot 2023-07-07 at 3 57 58 pm

Apologies about the crashes, it took me a while to work out what was going on.

Additional context This is related to #168 and #194

daxkellie commented 1 year ago

Thanks for reaching out. I was able to replicate this error and there does appear to be something wrong with how search_taxa() prioritises higher rank information supplied in a tibble.

At this point, I'm not sure why this is, but I first wanted to offer one solution:

Adding additional search information like authorship to your search can help return the correct results. On the ALA, the name authorship is attributed to F.Muell. Adding this information to your text search returns the correct result:

library(galah)
library(tibble)

search_taxa("Acanthocladium F.Muell")
#> # A tibble: 1 × 13
#>   search_term      scientific_name scientific_name_auth…¹ taxon_concept_id rank 
#>   <chr>            <chr>           <chr>                  <chr>            <chr>
#> 1 Acanthocladium … Acanthocladium  F.Muell.               https://id.biod… genus
#> # ℹ abbreviated name: ¹​scientific_name_authorship
#> # ℹ 8 more variables: match_type <chr>, kingdom <chr>, phylum <chr>,
#> #   class <chr>, order <chr>, family <chr>, genus <chr>, issues <chr>

And this seems to return an expected, nice, small number in a query too!

taxa <- search_taxa("Acanthocladium F.Muell")

galah_call() |>
  identify(taxa) |>
  atlas_counts()
#> # A tibble: 1 × 1
#>   count
#>   <int>
#> 1   128
wcornwell commented 1 year ago

Great! thanks for the workaround!