Open javier-gracia-tabuenca-tuni opened 3 weeks ago
We could have a DatabaseConnectorDbiConnection that wraps bigrquery connections similar to how DatabaseConnector wraps other DBI drivers. I think this would make sense.
Thanks @ablack3, Now that you mention this, I remember than 3 years ago I made hack to use DatabaseConnector to work with bigquery using DBI, bcs JDBI was not allowed in our research enviroment.
https://github.com/FINNGEN/DatabaseConnector
It worked very well, but the code is very dirty,
I could work on making this the right way if you think that PR will be accepted ???
that hack was very simple, just used bigrquery to make the DBI connection following how DatabaseConnector uses SQLite.
The dirty part comes, where I had to pass some extra parameters for bigrquery, I just added these paramenters to all the necessary functions, this should be different.
@ablack3: If bigrquery has superior performance to the JDBC driver, maybe we should swap them out in DatabaseConnector? (after we've completed DatabaseConnector 7.0.0, when this should be simpler)
The dirty part comes, where I had to pass some extra parameters for bigrquery, I just added these paramenters to all the necessary functions, this should be different.
@javier-gracia-tabuenca-tuni I think we could follow the pattern used for odbc that uses createDbiConnectionDetails
and connectUsingDbi
Otherwise your changes look good to me.
connectSparkUsingOdbc <- function(connectionDetails) {
inform("Connecting using Spark ODBC driver")
ensure_installed("odbc")
dbiConnectionDetails <- createDbiConnectionDetails(
dbms = connectionDetails$dbms,
drv = odbc::odbc(),
Driver = "Simba Spark ODBC Driver",
Host = connectionDetails$server(),
uid = connectionDetails$user(),
pwd = connectionDetails$password(),
Port = connectionDetails$port()
# UseNativeQuery=1
)
if (!is.null(connectionDetails$extraSettings)) {
dbiConnectionDetails <- append(
dbiConnectionDetails,
connectionDetails$extraSettings
)
}
connection <- connectUsingDbi(dbiConnectionDetails)
return(connection)
}
thanks @ablack3, that is a good guidance, (that was not there when i made my hack)
I will make a fork, add the changes, work with these on my tools, and if don't find major issues, we can merge these changes after v7.0.0 as suggested by @schuemie
does it sounds like a plan ?
Hi @javier-gracia-tabuenca-tuni, after talking with @schuemie we were thinking that after v7 you should be able to use DatabaseConnector with bigrquery by using createDbiConnectionDetails
.
Ideally DatabaseConnector only uses one driver for each database. Currently that is jdbc for BigQuery but it could be bigrquery in the future if all the tools work well with it. I would say we could give bigrquery a try and compare to the jdbc driver. If bigrquery seems to work better then we can switch to it in a future release. Feel free to make a branch with this change and we can test it out after v7.
ok, sounds good
one more question related to BQ.
There is this forced delay for only bigquery https://github.com/OHDSI/DatabaseConnector/blob/main/R/Sql.R#L307
I coudnt find what is the reason for this. I'd like to know if this is necessary always for BQ, or ir has to do with the driver, and changing the driver will fix this also.
BigQuery has a rate limit, especially for inserts. If you exceed it, you will get an error. To my knowledge this limit still applies.
Loading a table in BigQuery with DatabaseConnector::insertTable is quite slow
I usually have to use package bigrquery
quick dirty demo
If you want I can make insertTable with bulkLoad=TRUE to work also BQ using bigrquery, or do you prefer other method such command line, may be more complex but nor need an extra package ??