ankane / dbx

A fast, easy-to-use database library for R
Other
187 stars 15 forks source link

dbxSQL() #5

Closed petermeissner closed 6 years ago

petermeissner commented 6 years ago

I wrote this for my own usage and find it extremely useful although it is not yet really battle tested but maybe you like the idea and want to include it in the package ...

Problem

Solution Idea

A function that does both, marking as SQL and providing some means for a more lightweight pasting together.

#' dbxSQL
#'
#' @param query_string a query string with possible sprintf markups, see \link{sprintf}
#' @param ... values to put into string, see \link{sprintf}
#'
#' @export
#'
dbxSQL<- function(query_string, ...){
  # do recursion over parameter or not
  if ( length(query_string) > 1 | any(vapply(list(...), length, integer(1)) > 1) ){
    return(
      DBI::SQL(
        mapply(dbxSQL, query_string, ..., USE.NAMES = FALSE)
        )
    )
  }

  # paste together string, mark as SQL and return
  DBI::SQL(sprintf( query_string, ...))
}

example

 dbxSQL("Select * from %s WHERE job_id = %s", "mytable", 1:3)
## <SQL> Select * from mytable WHERE job_id = 1
## <SQL> Select * from mytable WHERE job_id = 2
## <SQL> Select * from mytable WHERE job_id = 3

What do you think?

ankane commented 6 years ago

Hey @petermeissner, thanks for the suggestion 👍 Can explain more about the use case? How would you use the result of dbxSQL?

petermeissner commented 6 years ago

Hey, sure ...

The use case for this is pretty much that no matter what you always will want to 'paste' together some SQL queries. While we can build good frameworks around basic select, insert, update, and delete statements how do you build e.g. complex with statement? Or how do you efficiently use that one new feature of your beloved database? Sure you can wait until somebody implemented a wrapper or higher level function or you simply use paste(). Now the problem is, that the use of paste is straining and error prone - statements pasted together with paste are really hard to read.

In contrast the statement above is really easy to read and in all cases where there are no higher functions yet or where statements are just to quirky and idiosyncratic to be put into a function this little friend will help a lot.

I have now worked intensively with this feature and I find it very robust, intuitive and helpful. It becomes even more helpful with another feature that I requested at DBI's repo that might also be implemented here: https://github.com/r-dbi/DBI/issues/260

petermeissner commented 6 years ago

Let's have another example:

library(RSQLite)
library(DBI)

con <- dbConnect(SQLite(), ":memory:")
sql <- dbxSQL("CREATE TABLE %s (id int,  %s text)", LETTERS, letters)
res <- lapply(sql, function(sql){dbExecute(conn = con, sql)})

... versus ...

library(RSQLite)
library(DBI)

con <- dbConnect(SQLite(), ":memory:")
sql <- paste0("CREATE TABLE ", LETTERS," (id int, ", letters," text)")
res <- lapply(sql, function(sql){dbExecute(conn = con, sql)})

... the second solution does the same as the first BUT printing sql is not as good as for the first solution and reading the statement becomes harder and harder.

ankane commented 6 years ago

Hey @petermeissner, thanks for the explanation. For commands that take raw SQL (just dbxSelect right now, although dbxExecute is probably needed), I think two approaches can be taken:

  1. Use sprintf to construct SQL queries
  2. Add bind parameters to the function calls

In the example above, you can replace dbxSQL with sprintf and it'll construct the queries properly.

For bind parameters, there's a bind_params branch right now where you can do:

params <- list(1)
res <- dbxSelect(db, "SELECT * FROM orders WHERE id > ? ORDER BY id", params=params)

I think both of these together can eliminate the need to have dbx functions specifically for constructing SQL. What do you think?

petermeissner commented 6 years ago

I think it's very good that we are having a discussion about it and part of the answer might be what you think the scope of the package is ...

Currently I am using a lot of with statements over a lot of tables with a lot of different column names. These are statements where reading becomes really hard, there are no wrappers - no dbxWithSelectThanInsertAndReturn() function - and dbxSQL() helps a lot.

But you also have to decide which scope the package should have: Should it 'only' provide solid CRUD with dbxSelect, dbxUpdate, ... or should it in general fill the gaps where DBI is lacking more higher level functionality for SQL code generation/execution. If the former I will take my issues and go elsewhere - probably my own package - if the latter I think this functionality really is missing, a good fit for the package and might be complemented by still another feature proposal (https://github.com/r-dbi/DBI/issues/257).

ankane commented 6 years ago

Yeah, I think it makes sense to go with a separate package for now, as I feel like it's currently disconnected from the rest of the dbx functionality.

ankane commented 6 years ago

fwiw, it looks like the glue package provides similar functionality: https://github.com/tidyverse/glue#glue_sql-makes-constructing-sql-statements-safe-and-easy

petermeissner commented 6 years ago

Ahh, thanks for the hint to glue and a whole lot of thanks for taking the time to talk things through - this is most appreciated.

petermeissner commented 6 years ago

FYI

I checked out glue::glue_sql() ... this is much superior to my solution - has automatic value escaping (can be turned of via SQL()), returns strings marked as SQL, and some other fancy stuff - 👍