AtlasOfLivingAustralia / galah-R

Query living atlases from R
https://galah.ala.org.au
38 stars 3 forks source link

`galah_filter()` errors when parsing `is.na()` #220

Closed daxkellie closed 4 months ago

daxkellie commented 7 months ago

Another error picked up by @fontikar

When using galah_filter(), attempting to return anything that is NA OR anything <= 1000 metres causes an error

library(galah)
library(dplyr)
library(janitor)

# Configure galah settings
galah_config(email = Sys.getenv("ALA_EMAIL"),
             atlas = "Australia", 
             download_reason_id = 5) #testing

# Desired query
# I want NAs or anything less that 1000m of uncertainty
galah_call() |> 
  galah_identify("Banksia serrata") |> 
  galah_filter(is.na(coordinateUncertaintyInMeters) | coordinateUncertaintyInMeters <= 1000) |> 
  atlas_counts()
#> Error in `check_fields()`:
#> ! Can't use fields that don't exist.
#> ℹ Use `search_all(fields)` to find a valid field ID.
#> ✖ Can't find field(s) in
#>   • `galah_filter()`: *
#> Backtrace:
#>      ▆
#>   1. └─galah::atlas_counts(...)
#>   2.   ├─dplyr::collect(slice_head(count(dr), n = limit))
#>   3.   └─galah:::collect.data_request(slice_head(count(dr), n = limit))
#>   4.     ├─dplyr::collect(compute(collapse(x, ...)), wait = wait, file = file)
#>   5.     ├─dplyr::compute(collapse(x, ...))
#>   6.     └─galah:::compute.query_set(collapse(x, ...))
#>   7.       ├─dplyr::compute(build_checks(x))
#>   8.       └─galah:::compute.query(build_checks(x))
#>   9.         └─galah:::compute_checks(x)
#>  10.           ├─galah:::check_profiles(check_fields(check_reason(check_login(.query))))
#>  11.           └─galah:::check_fields(check_reason(check_login(.query)))
#>  12.             └─rlang::abort(bullets)

Fonti has discovered that when run separately, the is.na() argument still errors due to incorrectly parsing an asterisk (*)

# I suspect something about combo of is.na OR and <= 100 is not evaluated correctly
query <- galah_call() |> 
  galah_identify("Banksia serrata") |> 
  galah_filter(is.na(coordinateUncertaintyInMeters) | coordinateUncertaintyInMeters <= 1000)

query$filter
#> # A tibble: 1 × 4
#>   variable                      logical value  query                            
#>   <chr>                         <glue>  <glue> <chr>                            
#> 1 coordinateUncertaintyInMeters ==|<=   |1000  ((*:* AND -coordinateUncertainty…

# Query works with <= 1000 but not with is.na() 
galah_call() |> 
  galah_identify("Banksia serrata") |> 
  galah_filter(coordinateUncertaintyInMeters <= 1000) |> 
  atlas_counts()
#> # A tibble: 1 × 1
#>   count
#>   <int>
#> 1 10994

galah_call() |> 
  galah_identify("Banksia serrata") |> 
  galah_filter(is.na(coordinateUncertaintyInMeters)) |> 
  atlas_counts()
#> Error in `check_fields()`:
#> ! Can't use fields that don't exist.
#> ℹ Use `search_all(fields)` to find a valid field ID.
#> ✖ Can't find field(s) in
#>   • `galah_filter()`: *
#> Backtrace:
#>      ▆
#>   1. └─galah::atlas_counts(...)
#>   2.   ├─dplyr::collect(slice_head(count(dr), n = limit))
#>   3.   └─galah:::collect.data_request(slice_head(count(dr), n = limit))
#>   4.     ├─dplyr::collect(compute(collapse(x, ...)), wait = wait, file = file)
#>   5.     ├─dplyr::compute(collapse(x, ...))
#>   6.     └─galah:::compute.query_set(collapse(x, ...))
#>   7.       ├─dplyr::compute(build_checks(x))
#>   8.       └─galah:::compute.query(build_checks(x))
#>   9.         └─galah:::compute_checks(x)
#>  10.           ├─galah:::check_profiles(check_fields(check_reason(check_login(.query))))
#>  11.           └─galah:::check_fields(check_reason(check_login(.query)))
#>  12.             └─rlang::abort(bullets)

Created on 2023-12-01 with reprex v2.0.2

Some testing might be required to investigate this parsing error

daxkellie commented 4 months ago

This is related to issue #230