AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
39 stars 3 forks source link

Improve coherence of taxonomic functions #54

Closed matildastevenson closed 9 months ago

matildastevenson commented 3 years ago

Currently there are four possible ways users can download taxonomic data:

  1. select_taxa() gives detailed taxonomic information at any requested level, and optionally includes counts, intermediate rank information and child taxon. Species don't need to have records to be included.
  2. ala_species() gives species-level information- specifically species that the ALA has records for
  3. ala_counts() gives count information, and this can be at any rank (including intermediate ranks)
  4. ala_occurrences() gives record information, and this can include species information

The functionality available in these functions overlaps significantly, to the point where it is hard to explain why a user would use one function over another. For example, to get the species in the genus Callistemon there are four methods:

  1. select_taxa("Callistemon", children = TRUE) 2.9 secs
  2. ala_species(taxa = select_taxa("Callistemon")) 0.73 secs
  3. ala_counts(taxa = select_taxa("Callistemon"), group_by = "species") 0.92 secs
  4. ala_occurrences(taxa = select_taxa("Callistemon")) 9.8 secs

As the times indicated above show- these methods are not equal in term of speed, or in the data they return.

The main features needed from one or more taxonomic functions (or combinations of functions) are:

Suggestions:

daxkellie commented 3 years ago

This issue has not been completely solved, but the addition of search_taxonomy() (#75 #81 ) has addressed the need for a function that can (1) return higher and lower taxa from any taxonomic rank, and (2) search for intermediate ranks for species

mjwestgate commented 9 months ago

There is some merit here, but the architecture has largely moved on. Closing this until someone makes a compelling argument that further changes are needed