Open nbenn opened 9 months ago
Rather than try to specify some regex or "bind pattern", I wonder if we just want to push the "bind by name" operation into the driver. For example:
adbc_statement_bind_by_name <- function(statement, stream) {
UseMethod("adbc_statement_bind_by_name")
}
adbc_statement_bind_by_name.default <- function(statement, stream) {
stop("bind by name not supported by this driver") # maybe throw a classed error that you can catch
}
The implementation for SQLite, for example, could use the parameter schema method (probably stealing the implementation from RSQLite).
RPostgres tests only positional:
The challenge for DBItest is that it must generate the queries to test against the driver. How would the new generic help?
If it's DBI test that needs this, maybe what we need is:
adbc_driver_dbitest_tweaks <- function(driver) {
UseMethod("adbc_driver_dbitest_tweaks")
}
adbc_driver_dbitest_tweaks.default <- function(driver) {
stop("dbitest tweaks not supported by this driver")
}
...since there are likely other driver-specific options that would be better set by the driver package.
Do you really want tight coupling between ADBC drivers and DBItest?
It could be specified somewhere else, too (e.g., an adbctest
package that is GitHub only), but I don't mind putting DBItest
in Suggests
and adding the generic. I would eventually like to run DBItest in ADBC's CI (maybe not CRAN's, though)...it makes sense to me that the test configuration lives somewhere in the ADBC repo that is as close to the individual driver as possible.
As things stand currently, DBItest does not need this. Setting up the DBItest context has to be done per back end anyways and it is there, where the placeholders can be defined explicitly.
It is some intermediary such as adbi, that needs this, as it has to be agnostic of the backend specifics yet might still want to add some smarts such as making sure parameters are passed in the right order (as long as this remains necessary) to isolate users from lower-level specifics.
I agree that it would be very confusing if a query containing $bread, $cheese
were passed data.frame(cheese = 1, bread = 2)
and the parameters were bound by position. It sounds like the idiomatic way to handle this in DBI is to bind by name if possible.
Testing aside, a simpler method would also be to add a bind_by_position = NA
argument to the relevant bind method and warn if NA
and error if FALSE
(i.e., make somebody write DBIStatmentBind(stmt, some_data_frame, bind_by_position = TRUE)
if they don't want a warning.
@paleolimbot from where I stand, what you proposed with adbc_statement_bind_by_name()
would make many of my problems go away here.
Trying to solve my problem specifically, we might also re-consider what names the parameter schema should have exactly. Does it make sense that parameter names come with "placeholder indicators"?
If the schema were to change, the only extra information I'd need is whether named parameter are supported by the back-end or not.
For the format/driver side I filed #1398
I think my preference for this specific issue would be:
adbc_statement_bind_by_name()
generic + helper in adbcdrivermanager with implementation in adbcsqlite and duckdb.Alternatively, you could continue to hard-code the string you need in whatever format you need it until the C driver API can provide that information. You would only need it for SQLite, Postgres, Snowflake, and DuckDB.
I like the idea of moving the DBItest tests to the respective drivers. @paleolimbot: what kind of support do you need?
See https://github.com/apache/arrow-adbc/pull/1401 . I still get quite a few failures but I imagine I'm not setting it up quite right!
Currently there is no way to determine what types of placeholders a backend supports for parametrized queries.
SQLite for example is quite flexible, supporting all of "?", "$i", "$name", ":name" (where "i" represents a parameter position and "name" a parameter name), while Postgres (I believe) only supports "?" and "$i" and cannot handle named parameters.
In what way this information could be communicated and whether it is worthwhile to do so is currently unclear to me. I'm hoping that this could be a place to have such a discussion.
Maybe I can start out explaining a bit why this became relevant tom me and then we see where this goes. It might be that we can sort out my specific issues without introducing such infrastructure at all. Of course this might be interesting to others nonetheless.
For context: I'm writing a high level wrapper in R that aligns the DBI interface with functionality exposed by adbcdrivermanager. As example, consider the following query:
We have a "parameter schema" with parameter names
At the same time, I want my users to be able to pass parameter values by referring to more natural names such as "agr" and "edu". Something like
As things stand, it seems that for named parameters, names are not considered and only positions are relevant. This makes the current implementation of parameter binding (at least for SQLite) quite lenient.
If I want to properly support named parameters, I have to sort out ordering on my end, as well as potentially raising errors for name/type mismatches. To do this, I need to match names in the parameter schema with my "natural" names. And this is where having information on what kinds of placeholder patterns are supported by the back-end becomes relevant to me.
Cc @krlmlr @paleolimbot