bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
82 stars 13 forks source link

cql_translate fixes for updated dbplyr #305

Closed ateucher closed 1 year ago

ateucher commented 2 years ago

Partially working but I still think it's more complex than need be and the built-in DBI/dbplyr function mapping is not working quite right...

304

ateucher commented 2 years ago

I think we need to make a somewhat major change in how we parse user input into filter() to make this work smoothly. Right now we have some ugly code that tries to detect parts of a filter call that should be evaluated locally (e.g., a call to an sf function) and then modifying the call by wrapping it in local(). partial_eval() then processes that call and forces that evaluation of the local() bits before turning the call into a CQL string. But this is fragile and can't reliably detect all cases. The recent change to partial_eval has made it stricter and it now throws errors on unevaluated calls that should have been evaluated and vice-versa.

The existing approach in bcdata actually goes against the way dbplyr was designed to be used - basically we should be asking the user to wrap parts of calls that should be evaluated locally in local(), rather than trying to guess and do it for them.

I propose we get rid of the stuff in cql_translate() that tries to figure out what should be locally evaluated, and ask the user to do it specifically. This is likely to break some people's code, so we will need to think about how to communicate the change and help people adjust.

Essentially, instead of something like:

bcdc_query_geodata("a-record") |>
  filter(OBJ_AREA_SQM > st_area(a_local_sf_object)) # OBJ_AREA_SQM is a field in the wfs object

would need to be something like:

bcdc_query_geodata("a-record") |>
  filter(OBJ_AREA_SQM > local(st_area(a_local_sf_object))) # OBJ_AREA_SQM is a field in the wfs object

Thoughts @boshek @stephhazlitt ?

ateucher commented 2 years ago

@boshek @stephhazlitt I've made the change. Everything passes nicely locally with the dev version of dbplyr. Haven't started on the documentation yet. The updates to the test files illustrate the user-facing changes.

stephhazlitt commented 2 years ago

This looks great @ateucher 🙏. I think it is a fair trade off re: requiring users to use local() in return for cleaner, more predictable+robust code. Pretty sure we have some use cases in the vignettes that will need updating with this change. We could think about some active advertising of the change, e.g. a tweet thread? Maybe even a new, short pkgdown article?

ateucher commented 2 years ago

Thanks @stephhazlitt. I think that the vignettes are actually ok... I did a manual check and precompiled them with the dev version of bcdata and they worked. I think both your communication ideas are good - a tweet thread, and either a new vignette or just add a section to the efficiently_query_geospatial_data vignette.

ateucher commented 2 years ago

Ok, I've updated one of the vignettes with a simple example... let me know what you think. Unrelated, but I finally figured out how to clean up those spurious dbQuoteIdentifier and dbQuoteString messages we've been getting (https://github.com/bcgov/bcdata/pull/305/commits/92146dbcc75028602a59ab356ba43bb67b262adf)

ateucher commented 1 year ago

/precompile

ateucher commented 1 year ago

Good catch - and it's awful:

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(bcdata)
#> 
#> Attaching package: 'bcdata'
#> The following object is masked from 'package:stats':
#> 
#>     filter

the_geom <- st_sfc(st_point(c(1164434, 368738)),
                   st_point(c(1203023, 412959)),
                   crs = 3005)

bcdc_query_geodata("local-and-regional-greenspaces") %>%
  filter(DWITHIN(st_buffer(the_geom, 10000, nQuadSegs = 2), 100, "meters"))
#> Authorizing with your stored API key
#> Warning: It is advised to use the permanent id ('6a2fea1b-0cc4-4fc2-8017-eaf755d516da') rather than the name of the record ('local-and-regional-greenspaces') to guard against future name changes.
#> Warning: Named arguments ignored for SQL st_buffer
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "c('sfc_POINT', 'sfc')"

Created on 2022-10-25 with reprex v2.0.2

It's really deep down in the stack and trying to preempt it might be really tricky. I think it might be best to just use a try and create a custom error if the error looks like "no applicable method for 'escape' applied to..."

boshek commented 1 year ago

Yeah that's a great thought. I think that's a good idea and we can see how that works.

ateucher commented 1 year ago

Thanks @stephhazlitt and @boshek!