darwin-eu / CDMConnector

A pipe friendly way to interact with an OMOP Common Data Model
https://darwin-eu.github.io/CDMConnector/
Apache License 2.0
12 stars 10 forks source link

Compatibility with dev dbplyr #6

Closed mgirlich closed 1 year ago

mgirlich commented 1 year ago

dbplyr now warns when the table name contains a ., as in most cases you actually want to specify a table in a schema. Your package came up in the revdepchecks as it also faced this issue. This PR should make it compatible with the new dev version. Note that I'm not entirely sure whether everything is fixed as I got an error Exited with status 137. when running the test suite.

ablack3 commented 1 year ago

Thank you for this @mgirlich. I have a weird issue with collect (see reprex below). Using "schema.table" was a workaround for duckdb. Should I open an issue on duckdb or dbplyr?

library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())

DBI::dbExecute(con, "create schema test")
#> [1] 0

DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)

# works
tbl(con, DBI::Id(schema = "test", table = "cars"))
#> # Source:   SQL [?? x 2]
#> # Database: DuckDB 0.8.2-dev77 [root@Darwin 21.6.0:R 4.2.2/:memory:]
#>    speed  dist
#>    <dbl> <dbl>
#>  1     4     2
#>  2     4    10
#>  3     7     4
#>  4     7    22
#>  5     8    16
#>  6     9    10
#>  7    10    18
#>  8    10    26
#>  9    10    34
#> 10    11    17
#> # ℹ more rows

# does not work
tbl(con, DBI::Id(schema = "test", table = "cars")) %>% 
  collect()
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "Id"

# works
tbl(con, DBI::Id(schema = "test", table = "cars")) %>% 
  mutate(a = 1) %>% 
  collect()
#> # A tibble: 50 × 3
#>    speed  dist     a
#>    <dbl> <dbl> <dbl>
#>  1     4     2     1
#>  2     4    10     1
#>  3     7     4     1
#>  4     7    22     1
#>  5     8    16     1
#>  6     9    10     1
#>  7    10    18     1
#>  8    10    26     1
#>  9    10    34     1
#> 10    11    17     1
#> # ℹ 40 more rows

DBI::dbDisconnect(con, shutdown = T)

Created on 2023-06-27 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.2 (2022-10-31) #> os macOS Big Sur ... 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/London #> date 2023-06-27 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.0) #> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0) #> dbplyr 2.3.2.9000 2023-06-12 [1] Github (tidyverse/dbplyr@05c5317) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0) #> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.2.0) #> duckdb 0.8.1 2023-06-21 [1] https://duckdb.r-universe.dev (R 4.2.3) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.2.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.0) #> fs 1.6.2 2023-04-25 [1] CRAN (R 4.2.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.2.0) #> knitr 1.43 2023-05-25 [1] CRAN (R 4.2.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.0) #> rmarkdown 2.22 2023-06-01 [1] CRAN (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> styler 1.10.1 2023-06-05 [1] CRAN (R 4.2.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0) #> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.2.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
ablack3 commented 1 year ago

@mgirlich Also I noticed the bigrquery uses "." to separate catalog, schema, and table components.

https://github.com/r-dbi/bigrquery/blob/9bd1425171ccc9897a8992ac8de56abd64274196/R/dbi-connection.R#L306

Using DBI::Id does not work.

bigrquery::bq_auth(path = Sys.getenv("BIGQUERY_SERVICE_ACCOUNT_JSON_PATH"))

con <- DBI::dbConnect(
  bigrquery::bigquery(),
  project = Sys.getenv("BIGQUERY_PROJECT_ID"),
  dataset = Sys.getenv("BIGQUERY_CDM_SCHEMA")
)

# does not work
DBI::dbWriteTable(con, DBI::Id(schema = "scratch", table = "temp_test"), cars, overwrite = TRUE)
#> Error in `signal_reason()`:
#> ! Not found: Dataset ohdsi-276919:`scratch` [notFound] 
#> Backtrace:
#>      ▆
#>   1. ├─DBI::dbWriteTable(...)
#>   2. └─DBI::dbWriteTable(...)
#>   3.   ├─DBI::dbWriteTable(...)
#>   4.   └─bigrquery::dbWriteTable(...)
#>   5.     └─bigrquery (local) .local(conn, name, value, ...)
#>   6.       └─bigrquery::bq_table_upload(...)
#>   7.         └─bigrquery::bq_perform_upload(x, values, ...)
#>   8.           └─bigrquery:::bq_upload(...)
#>   9.             └─bigrquery:::process_request(req)
#>  10.               └─bigrquery:::bq_check_response(status, type, content)
#>  11.                 └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#>  12.                   └─rlang::abort(message, class = paste0("bigrquery_", reason))

# works
DBI::dbWriteTable(con, "scratch.temp_test", cars, overwrite = TRUE)

# fails
dplyr::tbl(con, DBI::Id(schema = "scratch", table = "temp_test"))
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> Error in `signal_reason()`:
#> ! Invalid dataset ID "`scratch`". Dataset IDs must be alphanumeric (plus underscores and dashes) and must be at most 1024 characters long. [invalid] 
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::tbl(con, DBI::Id(schema = "scratch", table = "temp_test"))
#>   2. └─dplyr:::tbl.DBIConnection(con, DBI::Id(schema = "scratch", table = "temp_test"))
#>   3.   ├─dplyr::tbl(...)
#>   4.   └─dbplyr:::tbl.src_dbi(...)
#>   5.     └─dbplyr::tbl_sql(c(subclass, "dbi"), src = src, from = from, ...)
#>   6.       ├─vars %||% dbplyr_query_fields(src$con, from_sql)
#>   7.       └─dbplyr:::dbplyr_query_fields(src$con, from_sql)
#>   8.         └─dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
#>   9.           ├─rlang::eval_bare(expr((!!fun)(con, ...)))
#>  10.           ├─dplyr::db_query_fields(con, ...)
#>  11.           └─bigrquery:::db_query_fields.BigQueryConnection(con, ...)
#>  12.             └─bigrquery::bq_table_fields(tb)
#>  13.               └─bigrquery::bq_table_meta(x, fields = "schema")
#>  14.                 └─bigrquery:::bq_get(url, query = list(fields = fields))
#>  15.                   └─bigrquery:::process_request(req, raw = raw)
#>  16.                     └─bigrquery:::bq_check_response(status, type, content)
#>  17.                       └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#>  18.                         └─rlang::abort(message, class = paste0("bigrquery_", reason))

# works
dplyr::tbl(con, "scratch.temp_test")
#> # Source:   table<scratch.temp_test> [?? x 2]
#> # Database: BigQueryConnection
#>     dist speed
#>    <int> <int>
#>  1     2     4
#>  2    10     4
#>  3     4     7
#>  4    22     7
#>  5    16     8
#>  6    10     9
#>  7    18    10
#>  8    26    10
#>  9    34    10
#> 10    17    11
#> # ℹ more rows

DBI::dbDisconnect(con)

Created on 2023-06-27 with reprex v2.0.2

ablack3 commented 1 year ago

Ideally I wouldn't need CDMConnector::inSchema at all since it only exists to fix the slight differences between different database systems. It is important for us that the functions we use like dbWriteTable or dplyr::tbl or dbplyr::in_schema work the same on each database we need (postgres, sql server, oracle, redshift, snowflake, bigquery, ... etc)

ablack3 commented 1 year ago

And I'm very interested in helping test dbplyr sql translations across various database systems as well as extending dbplyr translations to new functions such as datediff or dateadd.

mgirlich commented 1 year ago

Ideally I wouldn't need CDMConnector::inSchema at all since it only exists to fix the slight differences between different database systems. It is important for us that the functions we use like dbWriteTable or dplyr::tbl or dbplyr::in_schema work the same on each database we need (postgres, sql server, oracle, redshift, snowflake, bigquery, ... etc)

I'm optimistic we can get rid of it soon 😄

duckdb

This is indeed a bug in duckdb.

A workaround is to wrap the connection with a dbplyr::src_dbi():

library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())

DBI::dbExecute(con, "create schema test")
#> [1] 0
DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)

src <- dbplyr::src_dbi(con, auto_disconnect = FALSE)
tbl(src, from = DBI::Id(schema = "test", table = "cars"))
#> # Source:   table<test.cars> [?? x 2]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.0/:memory:]
#>    speed  dist
#>    <dbl> <dbl>
#>  1     4     2
#>  2     4    10
#>  3     7     4
#>  4     7    22
#>  5     8    16
#>  6     9    10
#>  7    10    18
#>  8    10    26
#>  9    10    34
#> 10    11    17
#> # ℹ more rows

Created on 2023-06-28 with reprex v2.0.2

bigrquery

~It seems that bigrquery doesn't define a method for DBI::dbQuoteIdentifier() for DBI::Id(). The standard escaping doesn't seem to be correct for bigquery, so we also need to open an issue there.~

ablack3 commented 1 year ago

duckdb

This is indeed a bug in duckdb.

A workaround is to wrap the connection with a dbplyr::src_dbi():

library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())

DBI::dbExecute(con, "create schema test")
#> [1] 0
DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)

src <- dbplyr::src_dbi(con, auto_disconnect = FALSE)
tbl(src, from = DBI::Id(schema = "test", table = "cars"))
#> # Source:   table<test.cars> [?? x 2]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.0/:memory:]
#>    speed  dist
#>    <dbl> <dbl>
#>  1     4     2
#>  2     4    10
#>  3     7     4
#>  4     7    22
#>  5     8    16
#>  6     9    10
#>  7    10    18
#>  8    10    26
#>  9    10    34
#> 10    11    17
#> # ℹ more rows

Created on 2023-06-28 with reprex v2.0.2

But if a user runs dplyr::collect it still gives an error.

library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())

DBI::dbExecute(con, "create schema test")
#> [1] 0

DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)

src <- dbplyr::src_dbi(con, auto_disconnect = FALSE)

tbl(src, from = DBI::Id(schema = "test", table = "cars")) %>% 
  collect()
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "Id"

Created on 2023-06-28 with reprex v2.0.2

mgirlich commented 1 year ago

Works for me with the CRAN and the dev version of dbplyr. Maybe some of your packages aren't up to date? See the attached session info:

library(dplyr, warn.conflicts = F)
library(dbplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())

DBI::dbExecute(con, "create schema test")
#> [1] 0

DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)

src <- dbplyr::src_dbi(con, auto_disconnect = FALSE)

tbl(src, from = DBI::Id(schema = "test", table = "cars")) %>% 
  collect()
#> # A tibble: 50 × 2
#>    speed  dist
#>    <dbl> <dbl>
#>  1     4     2
#>  2     4    10
#>  3     7     4
#>  4     7    22
#>  5     8    16
#>  6     9    10
#>  7    10    18
#>  8    10    26
#>  9    10    34
#> 10    11    17
#> # ℹ 40 more rows

Created on 2023-06-28 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.0 (2023-04-21) #> os macOS Ventura 13.4 #> system x86_64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz UTC #> date 2023-06-28 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.0) #> dbplyr * 2.3.2 2023-03-21 [1] CRAN (R 4.3.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0) #> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0) #> duckdb 0.8.1 2023-06-16 [1] CRAN (R 4.3.0) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.2 2023-04-25 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0) #> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.22 2023-06-01 [1] CRAN (R 4.3.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) #> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
ablack3 commented 1 year ago

Ah I must have had an older version of dbplyr.

I still don't see a method for escape.Id in the current main branch of dbplyr though. Maybe it's not needed.

Here is what I was experimenting with.

escape <- function(x, parens = NA, collapse = " ", con = NULL) {UseMethod("escape")}
escape.Id <- function(x, parens = FALSE, collapse = ".", con = NULL) {
  dbplyr::sql(paste(DBI::dbQuoteIdentifier(con, attr(x, "name")), collapse = collapse))
}
mgirlich commented 1 year ago

The escape method for Id shouldn't be needed. The dev version of dbplyr added a new table identifier class to (at least internally) replace the various classes used so far for table identifiers: , , , , , , . All these classes are then converted to a new class . So, there is no need for dbplyr to provide such a class.

mgirlich commented 1 year ago

Can you merge this and release in the next days so that we can release dbplyr in the next weeks?

ablack3 commented 1 year ago

Hi @mgirlich, We do development in a separate private repo. I've made the change you suggest there and it will be in the next release. You can go ahead and release dbplyr. Thank you very much for the heads up!