AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
39 stars 3 forks source link

`search_taxa` sometimes returns incorrect rank when the user provides a `data.frame` #129

Closed mjwestgate closed 2 years ago

mjwestgate commented 2 years ago

By default, we expect the user to pass a string, which typically resolves correctly, or else fails noisily when it doesn't:

search_taxa("Glossodia major") |> as.data.frame()

      search_term scientific_name scientific_name_authorship                               taxon_concept_id    rank match_type
1 Glossodia major Glossodia major                      R.Br. https://id.biodiversity.org.au/name/apni/94961 species exactMatch
  kingdom     phylum         class       order      family     genus         species  issues
1 Plantae Charophyta Equisetopsida Asparagales Orchidaceae Glossodia Glossodia major noIssue

Whereas in this case, providing the same information in a data.frame resolves to a genus, not a species:

data.frame(genus = "Glossodia", species = "major") |>
  search_taxa() |>
  as.data.frame()

search_term scientific_name scientific_name_authorship                                 taxon_concept_id  rank match_type
1 Glossodia_major       Glossodia                      R.Br. https://id.biodiversity.org.au/node/apni/2921168 genus exactMatch
  kingdom     phylum         class       order      family     genus  issues
1 Plantae Charophyta Equisetopsida Asparagales Orchidaceae Glossodia noIssue

Weirdly, even providing the full binomial species name doesn't work, if genus is also provided:

data.frame(genus = "Glossodia", species = "Glossodia major") |>
  search_taxa() |>
  as.data.frame()

search_term scientific_name scientific_name_authorship                                 taxon_concept_id  rank
1 Glossodia_Glossodia major       Glossodia                      R.Br. https://id.biodiversity.org.au/node/apni/2921168 genus
  match_type kingdom     phylum         class       order      family     genus  issues
1 exactMatch Plantae Charophyta Equisetopsida Asparagales Orchidaceae Glossodia noIssue

The desirable behaviour here would be for search_taxa to always return a result with the same rank as the most specific rank given by the user.

shandiya commented 2 years ago

search_taxa() now returns a result with the same rank as the most specific rank provided by the user, and examples have been added to demonstrate how species may be identified using "specificEpithet" or "scientificName" in the search parameters.

Something that could be potentially done in future is to provide a more informative error message if the user tries to search for a species using the wrong parameters. For example:

search_taxa(tibble(species = "Pardalotus punctatus"))
No taxon matches were found for "Pardalotus punctatus".
# A tibble: 1 × 1
  species             
  <chr>               
1 Pardalotus punctatus

Ideally, the error message in this situation would indicate the wrong term has been used in the search i.e. "species" instead of "scientificName"