Closed r2evans closed 3 years ago
Hey @r2evans, thanks for another suggestion! It makes sense, but for simplicity, don't think I want to support another way to pass parameters right now (dbx currently uses ?
for all adapters).
What happens if the query contains a literal '?'
?
You can use DBI::sqlInterpolate()
and get escaping and named parameters for free. I'll release a fix with r-dbi/DBI#329 when CRAN opens.
Have you considered using dbBind()
(or passing the parameters as params
argument to dbSendQuery()
? If you're rewriting queries anyway, you might as well rewrite the placeholders to the backend-specific syntax, which gives you the advantages of parametrized queries. This is the essence of https://github.com/r-dbi/DBI/issues/52, which has been open for too long and which I now think is out of scope for DBI.
A literal question mark string is not a problem as long as the parsing of the string tokenizes it, seeing '?' instead of ?.
I've been doing this in a side package of mine (private, not in GitHub), and have been using it consistently with sql server, postgres, and sqlite alike.
Is dbx parsing the strings, or looking for ?
?
@ankane, one thought on why I think named parameters are a good thing to support and a better thing for dev and production: for readability and clarity, do you prefer sprintf
or the concept of glue
? Python has similar concepts with its "".format()
, dictionary string expansion "" % (,,)
, and most recently its f-strings, and I think there is great value in shifting the mindset away from positional bindings to named bindings.
string concatenation, ala R's paste
:
DBI::dbGetQuery(con, paste("
SELECT X, Y, Z
FROM SomeTable
WHERE X > ", myA, " AND Y = ", dbQuoteString(myB), " and X < ", myC)
)
It's clear that myA
is associated with the first X
comparison operation, etc, but the query flow is interrupted (subjective, too many quotes), and this is susceptible to SQL injection problems (malicious or inadvertent), plus it defeats attempts at query optimization on the server, if feasible.
positional, no names, ala sprintf
:
DBI::dbGetQuery(con, paste("
SELECT X, Y, Z
FROM SomeTable
WHERE X > ? AND Y = ? and X < ?",
params = list(myA, myB, myC)
)
The query flow is more readable, and this supports query optimization on the server, but my intentional choice to have the X
conditionals not co-located demonstrates that associating each ?
with its respective argument of params
can be difficult. With simpler queries it's trivial, but larger queries with more parameters make this very easy to mis-order the parameters. Especially when rewriting the query, where ?
s can easily be added (might be a syntax error) or moved (might not provide an error or warning, which is easily a problem).
named parameters, ala glue
or python's f-strings
DBI::dbGetQuery(con, paste("
SELECT X, Y, Z
FROM SomeTable
WHERE X > ?myA AND Y = ?myB and X < ?myC",
params = list(myA=myA, myB=myB, myC=myC)
)
This provides unambiguous, direct visual correlation between SQL fields and local parameters. The conversion from named to unnamed in R (since many databases don't support it) is trivial and unambiguous. (If there is ambiguity, it's an error.)
My suggestion is not necessarily to replace ?
altogether, though supporting both requires some extra safeguards. If for example
dbSelect <- function(..., named = FALSE) { ... }
then nothing is changed. I'd think strictly "all-or-nothing" would be best.
This isn't supported in DBI
, as that is as low-level as it needs to be, and changes to the query (which this suggestion does require) should probably be avoided at that level. The next place for this to be an option is in an extension package. I argue that dbX
is the place for this, and is in a good position to encourage good/safe practices.
At the risk of being a little dramatic, I suggest that named-parameters is one of these "good practices" that should be encouraged to new users (well, and experienced users, too). Whenever I onboard somebody into SQL, the first thing I do is discuss the tripping-points of positional-only parameters. I think dbX
is well positioned to enable that encouragement (not required it).
Just my $0.02, thanks.
Hey @r2evans, I understand that it can be useful in some situations, but don't want to support a second way of passing parameters right now. I appreciate the suggestion.
Parameterization is awesome, and downright mandatory in my eyes. However, at least historically I've been frustrated by the inconsistency between DBMSes. I think many are resolving to
?
, but I find it very useful to name the parameters. This is external to the DBMS query, and requires the query to be tweaked before sending it off.I've found that using named bind-parameters can be better in many ways than un-named parameters, in increasing order of importance (to me):
?
; andFor me, the first is a convenience, the second is more than convenience, and the third is a safeguard.
It might enable
The way-forward does require that the
?
mark is immediately followed by an unambiguous label (no spaces).This does require changing the query slightly: find the named params, match each with the named argument, then reassign all named params from
?name
to?
. It's not that hard pattern-wise, and if it's provided as a separate function (or the same function and an additional argument), then the overhead and risk of query-modification is reduced.So effectively this becomes a wrapper that takes named-qmarks and a named-list of values; and converts it into unnamed-qmarks and a possibly-reordered, possibly-repeated, possibly-reduced list of values. (I say possible-reduced because this might allow unused arguments to be ignored. Or one might prefer to enforce perfect matches, that argument can be made as well.)