AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
40 stars 2 forks source link

`galah` times out when downloading large number of records #192

Closed shandiya closed 11 months ago

shandiya commented 1 year ago

Describe the bug When trying to download a large number of records (> 3 million) using atlas_occurrences(), it appears that galah times out and returns this uninformative error message:

Error: need one of url or handle

galah version galah_1.5.1

To Reproduce

library(galah)
galah_call() |> 
  galah_apply_profile(ALA) |> 
  galah_select(kingdom, 
               phylum, 
               class, 
               order, 
               family,
               genus,
               species,
               scientificName, 
               vernacularName, 
               decimalLatitude,
               decimalLongitude,
               samplingProtocol) |> 
  galah_filter(year >= 2021) |> 
  atlas_occurrences()

Ideally it would be great if the download function doesn't time out at all. If it's necessary that a threshold is set, it would be helpful to have a more informative error message and some information about how to restart/continue the download.

This behaviour has been previously documented in issue #180.

daxkellie commented 11 months ago

This error should be fixed by using new collapse(), compute() & collect() architecture in galah 2.0.0. Users can now send queries to API with compute(), then download later with collect(). This should avoid the time-out error that seemed to be getting hit sporadically. For example:

# Create and send query to be calculated server-side
request <- request_data("occurrences") |>
  identify("perameles") |>
  filter(year > 1900) |>
  compute()

# Download data
request |>
  collect()