ankane / dbx

A fast, easy-to-use database library for R
Other
187 stars 15 forks source link

Separate upload from insert/update/delete procedure #17

Closed krlmlr closed 4 years ago

krlmlr commented 4 years ago

I'm working on adding persistence to {dm}: https://github.com/krlmlr/dm/pull/319. Because this is such a complex problem, only the simplest approach will work if at all. Therefore I require that, before inserting, the data has been already brought into the database.

Internally I'm using dbplyr::copy_to() for importing into temporary tables and hand-crafted SQL for doing the actual INSERT. I wonder if dbx could support inserting/updating/... from arbitrary SQL queries, perhaps if the records argument is of class "SQL"? Also, is the upload part in dbx better/more portable than dbplyr::copy_to() -- perhaps we could add a dbxUpload() that writes to a temporary table and returns the table name?

ankane commented 4 years ago

Hey @krlmlr, trying to understand the idea better. Can you give an example of how you're thinking it would work with R code?

I haven't used dbplyr::copy_to() but have an idea of how it works from the CRAN docs.

krlmlr commented 4 years ago

I think we could check if records is of type "SQL" or "Id", and emit DML that uses the SQL or table name the user passes. For a new dbxUpload(), I would expect that

dbxInsert(conn, table, records)

could be decomposed to

temp_table_name <- dbxUpload(conn, records)
dbxInsert(conn, table, temp_table_name)

Same for dbxUpdate() etc. -- for the latter we need a way to specify key columns.

ankane commented 4 years ago

I think this can be accomplished this DBI directly. Does something like this work for your use case?

library(DBI)

db <- dbConnect(RSQLite::SQLite(), ":memory:")

# existing records
records <- data.frame(id=c(1, 2), browser=c("Chrome", "Firefox"))
dbWriteTable(db, "destination", records)

# new records
new_records <- data.frame(id=c(3, 4), browser=c("IE", "Brave"))
tmp_table = paste0("temp_", sample(10000000, 1))
dbWriteTable(db, tmp_table, new_records, temporary=TRUE)
dbExecute(db, paste("INSERT INTO destination SELECT * FROM", tmp_table))

print(dbReadTable(db, "destination"))
krlmlr commented 4 years ago

Thanks. I have a working implementation. I was wondering if I could base it on dbx instead to make it more bullet-proof and offload dealing with peculiarities of different database backends and SQL dialects.

ankane commented 4 years ago

I'm not sure dbx will make it more bulletproof or compatible here, as the PR just runs INSERT INTO destination SELECT * FROM source, which should work on all SQL databases. Are there things you'd change about dbWriteTable(..., temporary=TRUE)?

krlmlr commented 4 years ago

What about UPDATE, UPSERT, TRUNCATE, DELETE for a query? Do all these work consistently and identically on all databases?

My proposed dbxUpload() would take care of picking a name for the temporary table and forward to dbWriteTable().

ankane commented 4 years ago

Can you explain more about how the other operations fit into the idea?

Re dbxUpload: Since it can be done in two lines of code with DBI and doesn't seem super common, I don't think it makes sense to add.

krlmlr commented 4 years ago

Since there's overlap in scope and functionality with existing dbx I propose to pool resources. I can contribute, eventually this needs to go to CRAN. If you'd rather keep dbx stable I implement myself.

ankane commented 4 years ago

I'm not sure I see the overlap, so I think it's best handled outside of dbx. Regardless, thanks for the suggestion!