AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
38 stars 3 forks source link

Resolve inconsistency between `dplyr` and `galah` implementations of `collapse()` and `compute()` #217

Closed mjwestgate closed 4 months ago

mjwestgate commented 7 months ago

As of version 2.0, you can use collapse() to convert a data_request into a query_set. A query_set often contains >1 query, with the final query being the one requested by the user, and earlier querys being needed to check that it is valid. Finally, calling compute() evaluates the query_set into a single, valid URL:

x <- galah_call() |>
    filter(year == 2010) |>
    count() |>
    collapse()
x
Object of class `query_set` containing 3 queries:
• metadata/fields data: galah:::check_internal_cache()$fields
• metadata/assertions data: galah:::check_internal_cache()$assertions
• data/occurrences-count url: https://biocache-ws.ala.org.au/ws/occurrences/sea...

compute(x)$url
[1] "https://biocache-ws.ala.org.au/ws/occurrences/search?fq=%28year%3A%222010%22%29&disableAllQualityFilters=true&pageSize=0"

For occurrence queries, however, compute() is actually doing three things:

This is not ideal, because it means that for occurrence queries, it is impossible to interrogate the URL before sending, or to see it once it has been sent. Fundamentally, this problem occurs because of a conflict between the principle that collapse() shouldn't ping an API, and the need in galah to run checks before building a URL. I see two possible solutions:

My instinct is that the former is neater, though it does change some behaviour from version 2.0. For context, this is what dplyr says about these functions:

compute() stores results in a remote temporary table. collect() retrieves data 
into a local tibble. collapse() is slightly different: it doesn't force computation, 
but instead forces generation of the SQL query. This is sometimes needed to 
work around bugs in dplyr's SQL generation.
mjwestgate commented 5 months ago

Decision on 17-01-2024:

mjwestgate commented 4 months ago

Implemented in v 2.0.1