AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
40 stars 2 forks source link

atlas_occurrences fails for UK: "Columns don't exist" #202

Closed CrunchyLettuce closed 10 months ago

CrunchyLettuce commented 1 year ago

I'm trying to retrieve NBN occurrences (the UK dataset), but each time it's throwing up the same error: that there are missing columns. I'm able to get counts from the UK data fine, and when I try the other datasets I don't get this error. I'm using Galah version 1.5.3

My code

galah_config(atlas = "United Kingdom")

galah_config(email = "my@email.com")

result <- galah_call() %>%
  galah_identify("reptilia") %>%
  galah_filter(year >= 2020) %>%
  atlas_occurrences()

Error

Error in `all_of()`:
! Can't subset columns that don't exist.
✖ Columns `decimalLatitude`, `decimalLongitude`, `scientificName`, `recordID`, and `dataResourceName` don't exist.
Run `rlang::last_trace()` to see where the error occurred.

Any help would be great, thank you.

daxkellie commented 1 year ago

Thanks for reaching out about this issue. This error appears to show that atlas_occurrences() is attempting to use the default set of fields (ie column names) for the Atlas of Living Australia rather than the UK's National Biodiversity Network when building the query. This should just need a small fix to get working again, which we'll be sure to add to the next version of {galah}. Thanks for letting us know!

In the meantime, one way to avoid this error is to specify which columns you want in your query with galah_select(). Specifying columns prevents atlas_occurrences() from attempting to use any of the defaults.

I managed to find the equivalent fields using search_all(). Feel free to add more or less fields to your query as you need! 😄 (also, note that I changed the year in galah_filter() to reduce the amount of data returned in this example)

library(galah)
library(magrittr)

galah_config(email = "your-email-here", atlas = "United Kingdom")
#> Atlas selected: National Biodiversity Network (NBN) [United Kingdom]

# search_all(fields, "data resource") # example search

result <- galah_call() %>%
  galah_identify("reptilia") %>%
  galah_filter(year >= 2022) %>%
  galah_select(longitude, latitude, taxon_name, id, data_resource) %>%
  atlas_occurrences()
#> This query will return 6,778 records
#> 
#> Checking queue
#> Current queue size: 1 inqueue  running .

result
#> # A tibble: 6,778 × 5
#>    decimalLongitude decimalLatitude scientificName   recordID      data_resource
#>               <dbl>           <dbl> <chr>            <chr>         <chr>        
#>  1           -3.10             52.9 Anguis fragilis  ecb97e98-655… Records of a…
#>  2           -2.75             52.7 Zootoca vivipara c7f1f243-615… Records of a…
#>  3            0.67             50.9 Vipera berus     c627df55-a7b… Records of a…
#>  4           -0.374            50.9 Anguis fragilis  c3caa230-333… Records of a…
#>  5            0.849            51.8 Zootoca vivipara bc73fc49-59a… Records of a…
#>  6           -4.05             50.4 Zootoca vivipara b5d964d1-c13… Froglife's a…
#>  7           -3.46             50.7 Anguis fragilis  a6924290-e28… Records of a…
#>  8           -3.23             51.6 Natrix helvetica 9f290871-554… SEWBReC Rept…
#>  9           -1.14             50.7 Anguis fragilis  956b44dd-f11… Records of a…
#> 10           -0.315            52.1 Natrix helvetica 91a0420a-a2f… Froglife's a…
#> # ℹ 6,768 more rows

Created on 2023-07-12 with reprex v2.0.2

daxkellie commented 1 year ago

On another note, some but not all of the column names are changed in the tibble returned to match field names in the ALA. The names make sense, but seems strange to get new column names given we specified the fields in galah_select(). We might want to update this renaming to be more consistent / clear to users

/cc @mjwestgate

CrunchyLettuce commented 1 year ago

Thanks for the quick fix! That's solved my issue.

daxkellie commented 1 year ago

No worries! We might keep this issue open for a bit longer as there are a few things here we still need to do to make sure this is fixed in the next version of galah 😃

CrunchyLettuce commented 1 year ago

Not sure whether I should open this as a new issue, but it might be related to any fixes you're doing.

The code above worked for one request, but now I'm getting this error:

This query will return 6,778 records

Checking queue
Current queue size: 2 inqueue  failed Error: need one of url or handle

I've tried restarting R and doing the config command again, but I'm still getting the same error.

daxkellie commented 1 year ago

That error usually happens when the Atlas you are trying to query does not return anything after ~10 to 15 minutes. My guess is that because it says you were number 2 in the queue, the person's query before you might have been very large and held up your download for a while. Alternatively, the NBN might have had another issue that slowed your download long enough to time out. This can be a frustrating error, though, because often the solution is to be patient.

We have some fixes coming through in the next version to prevent galah from timing out, but for now this is an error that is most likely solved by the Atlas after a while - eventually whatever is holding up the queue will run and everything will work again. My advice when this crops up is to wait for a little while (maybe 30 mins to an hour) and then rerun your query again.

The good news is that I just ran a query and it returned a result, so it looks like things are working again!

mjwestgate commented 10 months ago

Looks like this is solved for now - plus version 2.0 has specific tests for occurrence downloads from the UK - so I'll mark this as closed. Happy to reopen again if there is still a problem.