AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
39 stars 3 forks source link

Are the `S3` classes in `galah` useful? #133

Closed mjwestgate closed 1 year ago

mjwestgate commented 2 years ago

In v.1.4.0, all functions with the galah_ prefix - and some underlying functions such as search_taxa - have their function name as an additional S3 class:

class(galah_filter(year > 2010))
[1] "tbl_df"       "tbl"          "data.frame"   "galah_filter"

These S3 classes are used by galah_call to ensure arguments are piped correctly. This is different to the atlas_ group of functions, which return 'simple' tibbles with additional attributes.

galah_call() |> galah_identify("Litoria") |> atlas_counts() |> str()
tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
 $ count: int 402280
 - attr(*, "data_type")= chr "counts"
 - attr(*, "data_request")=List of 7
  ..$ identify     : tibble[,1] (S3: tbl_df/tbl/data.frame/galah_identify)
  .. ..$ identifier: chr "urn:lsid:biodiversity.org.au:afd.taxon:5282f9c3-12ff-4092-a8ae-ea71f2dde7da"
  ..$ filter       : NULL
  ..$ geolocate    : NULL
  ..$ group_by     : NULL
  ..$ limit        : num 100
  ..$ type         : chr "record"
  ..$ refresh_cache: logi FALSE
  ..- attr(*, "class")= chr "data_request"

We chose the attributes solution for atlas_ functions because bespoke S3 classes didn't work well with pipes, and in fact that's still true with some underlying functions, e.g. when filtering:

search_taxa("Litoria dentata", "nothing") %>% 
  filter(!is.na(taxon_concept_id)) 
No taxon matches were found for "nothing".
Error: Input must be a vector, not a <tbl_df/tbl/data.frame/ala_id> object.
Run `rlang::last_error()` to see where the error occurred.

This use case (removing unknown species via filter) instead requires removing of the additional S3 class:

search_taxa("Litoria dentata", "nothing") %>% 
  as_tibble() %>%
  filter(!is.na(taxon_concept_id))  # works

To avoid this inconsistency, it might be desirable to stop using S3 classes within galah_ functions, and instead add a call attribute (or similar) that specifies the name of the function that created it. This would allow piping via galah_call() while not breaking piping for those who want to use it.