Closed mgirlich closed 1 year ago
Thank you for this @mgirlich. I have a weird issue with collect (see reprex below). Using "schema.table" was a workaround for duckdb. Should I open an issue on duckdb or dbplyr?
library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())
DBI::dbExecute(con, "create schema test")
#> [1] 0
DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)
# works
tbl(con, DBI::Id(schema = "test", table = "cars"))
#> # Source: SQL [?? x 2]
#> # Database: DuckDB 0.8.2-dev77 [root@Darwin 21.6.0:R 4.2.2/:memory:]
#> speed dist
#> <dbl> <dbl>
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#> 7 10 18
#> 8 10 26
#> 9 10 34
#> 10 11 17
#> # ℹ more rows
# does not work
tbl(con, DBI::Id(schema = "test", table = "cars")) %>%
collect()
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "Id"
# works
tbl(con, DBI::Id(schema = "test", table = "cars")) %>%
mutate(a = 1) %>%
collect()
#> # A tibble: 50 × 3
#> speed dist a
#> <dbl> <dbl> <dbl>
#> 1 4 2 1
#> 2 4 10 1
#> 3 7 4 1
#> 4 7 22 1
#> 5 8 16 1
#> 6 9 10 1
#> 7 10 18 1
#> 8 10 26 1
#> 9 10 34 1
#> 10 11 17 1
#> # ℹ 40 more rows
DBI::dbDisconnect(con, shutdown = T)
Created on 2023-06-27 with reprex v2.0.2
@mgirlich Also I noticed the bigrquery uses "." to separate catalog, schema, and table components.
Using DBI::Id does not work.
bigrquery::bq_auth(path = Sys.getenv("BIGQUERY_SERVICE_ACCOUNT_JSON_PATH"))
con <- DBI::dbConnect(
bigrquery::bigquery(),
project = Sys.getenv("BIGQUERY_PROJECT_ID"),
dataset = Sys.getenv("BIGQUERY_CDM_SCHEMA")
)
# does not work
DBI::dbWriteTable(con, DBI::Id(schema = "scratch", table = "temp_test"), cars, overwrite = TRUE)
#> Error in `signal_reason()`:
#> ! Not found: Dataset ohdsi-276919:`scratch` [notFound]
#> Backtrace:
#> ▆
#> 1. ├─DBI::dbWriteTable(...)
#> 2. └─DBI::dbWriteTable(...)
#> 3. ├─DBI::dbWriteTable(...)
#> 4. └─bigrquery::dbWriteTable(...)
#> 5. └─bigrquery (local) .local(conn, name, value, ...)
#> 6. └─bigrquery::bq_table_upload(...)
#> 7. └─bigrquery::bq_perform_upload(x, values, ...)
#> 8. └─bigrquery:::bq_upload(...)
#> 9. └─bigrquery:::process_request(req)
#> 10. └─bigrquery:::bq_check_response(status, type, content)
#> 11. └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#> 12. └─rlang::abort(message, class = paste0("bigrquery_", reason))
# works
DBI::dbWriteTable(con, "scratch.temp_test", cars, overwrite = TRUE)
# fails
dplyr::tbl(con, DBI::Id(schema = "scratch", table = "temp_test"))
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> Error in `signal_reason()`:
#> ! Invalid dataset ID "`scratch`". Dataset IDs must be alphanumeric (plus underscores and dashes) and must be at most 1024 characters long. [invalid]
#> Backtrace:
#> ▆
#> 1. ├─dplyr::tbl(con, DBI::Id(schema = "scratch", table = "temp_test"))
#> 2. └─dplyr:::tbl.DBIConnection(con, DBI::Id(schema = "scratch", table = "temp_test"))
#> 3. ├─dplyr::tbl(...)
#> 4. └─dbplyr:::tbl.src_dbi(...)
#> 5. └─dbplyr::tbl_sql(c(subclass, "dbi"), src = src, from = from, ...)
#> 6. ├─vars %||% dbplyr_query_fields(src$con, from_sql)
#> 7. └─dbplyr:::dbplyr_query_fields(src$con, from_sql)
#> 8. └─dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
#> 9. ├─rlang::eval_bare(expr((!!fun)(con, ...)))
#> 10. ├─dplyr::db_query_fields(con, ...)
#> 11. └─bigrquery:::db_query_fields.BigQueryConnection(con, ...)
#> 12. └─bigrquery::bq_table_fields(tb)
#> 13. └─bigrquery::bq_table_meta(x, fields = "schema")
#> 14. └─bigrquery:::bq_get(url, query = list(fields = fields))
#> 15. └─bigrquery:::process_request(req, raw = raw)
#> 16. └─bigrquery:::bq_check_response(status, type, content)
#> 17. └─bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
#> 18. └─rlang::abort(message, class = paste0("bigrquery_", reason))
# works
dplyr::tbl(con, "scratch.temp_test")
#> # Source: table<scratch.temp_test> [?? x 2]
#> # Database: BigQueryConnection
#> dist speed
#> <int> <int>
#> 1 2 4
#> 2 10 4
#> 3 4 7
#> 4 22 7
#> 5 16 8
#> 6 10 9
#> 7 18 10
#> 8 26 10
#> 9 34 10
#> 10 17 11
#> # ℹ more rows
DBI::dbDisconnect(con)
Created on 2023-06-27 with reprex v2.0.2
Ideally I wouldn't need CDMConnector::inSchema
at all since it only exists to fix the slight differences between different database systems. It is important for us that the functions we use like dbWriteTable or dplyr::tbl or dbplyr::in_schema work the same on each database we need (postgres, sql server, oracle, redshift, snowflake, bigquery, ... etc)
And I'm very interested in helping test dbplyr sql translations across various database systems as well as extending dbplyr translations to new functions such as datediff or dateadd.
Ideally I wouldn't need
CDMConnector::inSchema
at all since it only exists to fix the slight differences between different database systems. It is important for us that the functions we use like dbWriteTable or dplyr::tbl or dbplyr::in_schema work the same on each database we need (postgres, sql server, oracle, redshift, snowflake, bigquery, ... etc)
I'm optimistic we can get rid of it soon 😄
This is indeed a bug in duckdb.
A workaround is to wrap the connection with a dbplyr::src_dbi()
:
library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())
DBI::dbExecute(con, "create schema test")
#> [1] 0
DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)
src <- dbplyr::src_dbi(con, auto_disconnect = FALSE)
tbl(src, from = DBI::Id(schema = "test", table = "cars"))
#> # Source: table<test.cars> [?? x 2]
#> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.0/:memory:]
#> speed dist
#> <dbl> <dbl>
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#> 7 10 18
#> 8 10 26
#> 9 10 34
#> 10 11 17
#> # ℹ more rows
Created on 2023-06-28 with reprex v2.0.2
~It seems that bigrquery doesn't define a method for DBI::dbQuoteIdentifier()
for DBI::Id()
. The standard escaping doesn't seem to be correct for bigquery, so we also need to open an issue there.~
duckdb
This is indeed a bug in duckdb.
A workaround is to wrap the connection with a
dbplyr::src_dbi()
:library(dplyr, warn.conflicts = F) con <- DBI::dbConnect(duckdb::duckdb()) DBI::dbExecute(con, "create schema test") #> [1] 0 DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars) src <- dbplyr::src_dbi(con, auto_disconnect = FALSE) tbl(src, from = DBI::Id(schema = "test", table = "cars")) #> # Source: table<test.cars> [?? x 2] #> # Database: DuckDB 0.8.1 [root@Darwin 22.5.0:R 4.3.0/:memory:] #> speed dist #> <dbl> <dbl> #> 1 4 2 #> 2 4 10 #> 3 7 4 #> 4 7 22 #> 5 8 16 #> 6 9 10 #> 7 10 18 #> 8 10 26 #> 9 10 34 #> 10 11 17 #> # ℹ more rows
Created on 2023-06-28 with reprex v2.0.2
But if a user runs dplyr::collect
it still gives an error.
library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())
DBI::dbExecute(con, "create schema test")
#> [1] 0
DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)
src <- dbplyr::src_dbi(con, auto_disconnect = FALSE)
tbl(src, from = DBI::Id(schema = "test", table = "cars")) %>%
collect()
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "Id"
Created on 2023-06-28 with reprex v2.0.2
Works for me with the CRAN and the dev version of dbplyr. Maybe some of your packages aren't up to date? See the attached session info:
library(dplyr, warn.conflicts = F)
library(dbplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb())
DBI::dbExecute(con, "create schema test")
#> [1] 0
DBI::dbWriteTable(con, DBI::Id(schema = "test", table = "cars"), cars)
src <- dbplyr::src_dbi(con, auto_disconnect = FALSE)
tbl(src, from = DBI::Id(schema = "test", table = "cars")) %>%
collect()
#> # A tibble: 50 × 2
#> speed dist
#> <dbl> <dbl>
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#> 7 10 18
#> 8 10 26
#> 9 10 34
#> 10 11 17
#> # ℹ 40 more rows
Created on 2023-06-28 with reprex v2.0.2
Ah I must have had an older version of dbplyr.
I still don't see a method for escape.Id in the current main branch of dbplyr though. Maybe it's not needed.
Here is what I was experimenting with.
escape <- function(x, parens = NA, collapse = " ", con = NULL) {UseMethod("escape")}
escape.Id <- function(x, parens = FALSE, collapse = ".", con = NULL) {
dbplyr::sql(paste(DBI::dbQuoteIdentifier(con, attr(x, "name")), collapse = collapse))
}
The escape method for Id shouldn't be needed. The dev version of dbplyr added a new table identifier class to (at least internally) replace the various classes used so far for table identifiers:
Can you merge this and release in the next days so that we can release dbplyr in the next weeks?
Hi @mgirlich, We do development in a separate private repo. I've made the change you suggest there and it will be in the next release. You can go ahead and release dbplyr. Thank you very much for the heads up!
dbplyr now warns when the table name contains a
.
, as in most cases you actually want to specify a table in a schema. Your package came up in the revdepchecks as it also faced this issue. This PR should make it compatible with the new dev version. Note that I'm not entirely sure whether everything is fixed as I got an errorExited with status 137.
when running the test suite.