bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
81 stars 12 forks source link

Forward compatibility with dplyr 2.5.0 #338

Closed hadley closed 4 months ago

hadley commented 5 months ago

This seems like a pretty uncompelling set of changes and I feel like it's likely to substantially negatively impact the usability of your package. Do you agree? If so, I'll see what I can change in dbplyr to make 2.5.0 more appealing.

hadley commented 4 months ago

Hmmmm, maybe I misinterpreted what's going on here because it looks like you are suppling a custom translation via sql_translation.wfsConnection so your functions should already be registered.

So the problem is likely due to https://github.com/tidyverse/dbplyr/pull/1430 — because dbplyr translates all the arguments to a function before passing it to the function. But in this case, you want the arguments to the function to be regular R objects because your function is going to do the translation. Hmmmmmmmmmmmmmmm.

hadley commented 4 months ago

Yeah, so a better fix for (e.g.) WITHIN would be:

bcdc_query_geodata("bc-airports") %>%
  filter(WITHIN(local(local))) %>%
  collect()

And unfortunately !! doesn't work here (possibly it should, but I'm not sure it's worth reasoning out the implications).

So the problem here is not declaring additional translations but declaring that WITHIN and friends expect a local object, not a database column. That unfortunately makes them different to the existing dbplyr translations so I don't have a good model for how this should work.

But I think this means I'm being too aggressive at erroring on unknown types in partial_eval_sym() and the best option will just be to revert to the previous behaviour. If I do attempt to tackle this again in the future, it would probably be better to do type checking in individual translation functions so that we can be more specific about what the expected inputs are (e.g. either a whole number or a variable name).

ateucher commented 4 months ago

Thanks @hadley for taking a look at this. Yeah, those uppercase predicates (WITHIN, INTERSECTS) etc. take a local sf object and translate it to a string to pass as a query parameter in the GET/POST request to the WFS server.

We have precedent for asking users to use local() to force local evaluation of functions (https://bcgov.github.io/bcdata/articles/local-filter.html), so it wouldn't be the end of the world to enforce the same thing for local (non-function) objects inside WITHIN and friends, but it would be a breaking change.

It is a shame that !! doesn't work, as it's a bit neater (but I understand not wanting to go down a new rabbit hole). EDIT: It does work with an atomic vector (see below with a bbox object, which is a double of length 4).

Your idea of being able to register a function as expecting a local object would be really nice.

Putting this here mostly for my edification:

library(bcdata)
library(sf)

x <- bcdc_query_geodata('d1aff64e-dbfe-45a6-af97-582b7f6418b9') |> 
  filter(ADMIN_AREA_NAME == "Cariboo Regional District") |> 
  collect()

airports <- '76b1b7a3-2112-4444-857a-afccf7b20da8'

# original behaviour, no longer works with dev dbplyr 2.5.0
res <- bcdc_query_geodata(airports) |> 
  filter(WITHIN(x)) |> 
  collect()
#> Error:
#> ! Cannot translate a data.frame to SQL.
#> ℹ Do you want to force evaluation in R with (e.g.) `!!x$x` or `local(x$x)`?

# works with dev dbplyr 2.5.0
res <- bcdc_query_geodata(airports) |> 
  filter(!!WITHIN(x)) |> 
  collect()

# works with dev dbplyr 2.5.0
res <- bcdc_query_geodata(airports) |> 
  filter(local(WITHIN(x))) |> 
  collect()

# works with dev dbplyr 2.5.0
res <- bcdc_query_geodata(airports) |> 
  filter(WITHIN(local(x))) |> 
  collect()

# does not work with dev dbplyr 2.5.0
res <- bcdc_query_geodata(airports) |> 
  filter(WITHIN(!!x)) |> 
  collect()
#> Error in `FUN()`:
#> ! Unknown input type: list

# !! does work with an atomic vector:
x_bbox <- st_bbox(x)
res <- bcdc_query_geodata(airports) |> 
  filter(WITHIN(!!x_bbox)) |> 
  collect()

Created on 2024-02-22 with reprex v2.1.0

ateucher commented 4 months ago

Just looking at the testthat failures here, there is something funny going on with how messages are emitted when using !! vs local() - The expected message is printed to the console but not detected by testthat::expect_message():

library(bcdata)
library(testthat)

x <- bcdc_query_geodata('d1aff64e-dbfe-45a6-af97-582b7f6418b9') %>% 
  filter(ADMIN_AREA_NAME == "Cariboo Regional District") %>% 
  collect()

airports <- '76b1b7a3-2112-4444-857a-afccf7b20da8'

# fine
expect_message(
  bcdc_query_geodata(airports) %>% 
    filter(WITHIN(local(x))) %>% 
    collect()
)

# says no message produced
expect_message(
  bcdc_query_geodata(airports) %>% 
    filter(!!WITHIN(x)) %>% 
    collect()
)
#> The object is too large to perform exact spatial operations using bcdata.
#> Object size: 697696 bytes
#> BC Data Threshold: 500000 bytes
#> Exceedance: 197696 bytes
#> See ?bcdc_check_geom_size for more details
#> A bounding box was drawn around the object passed to WITHIN and all features within the box will be returned.
#> Error: ``%>%`(...)` did not produce any messages.

Created on 2024-02-22 with reprex v2.1.0

hadley commented 4 months ago

FWIW I'm going to back this change out of dbplyr so that bcdata doesn't break. (But mostly because it is a large, unanticipated, change to the semantics of dbplyr function translation)

hadley commented 4 months ago

Closing this in favour of https://github.com/tidyverse/dbplyr/pull/1466

Thanks for the feedback!

ateucher commented 4 months ago

Thanks @hadley!