Closed camilogarciabotero closed 2 months ago
Creating a more unified backend approach is a great idea. Not late at all.
Technically, the only backend that connects with the open_duckdb
method is duckdb, and I did that partially for convenience with docstrings to keep moving the package forward.
All the other backends (except SQLite) are as follows:
using Clickhouse/MySQL/ODBC/LibPQ
con = Clickhouse.connect(connection_string) # ie username, password, etc
So the user is actually having to load the specific package they need and set up the connection.
That being said, unifying how the connect is made would be lovely so one function would be used for all and the the user wouldn't have to load the libraries anymore (although duckdb and SQLite are a bit different because they're local unlike the others)
Is that along the lines of what you were thinking?
Is that along the lines of what you were thinking?
Right, so a couple of cases came also to my mind:
connect
method, for instance:db = DB.duckdb_connect(mem);
Could be something like:
db = DB.connect(mem; backend::Backend=duckdb, kwargs...);
table
method:@chain DB.db_table(db, :mtcars) begin
...
end
Could simply be:
@chain DB.table(db, :mtcars) begin
...
end
It already has DB
as the import
What do you think?
I like the example of the connection one, I think it's definitely something we could do and then ppl can choose to use the method (original vs Julian) if they want. It wouldn't be a heavy lift to do.
For db_table
, im inclined to leave it as it is to keep it district from Tables.jl / folks might be using TidierDB independently without the DB. prefix. That being said I'm open.
Still on this topic: in R I always use the pool
package to manage connections: it auto-creates more connections, validate and so on. There is a "factory" that you pass to generate the connections and checks to be made.
Currently with TidierDB I wrote a simple macro that create a temporary connection and kills it after the collect. It is good enough for me, and fast enough for what I do.
I'm not familiar with that package. I will take a look at it.
Would you might sharing an example with the macro you created? I am not sure killing a connection after the collect
is a very common part of workflows, but it sounds interesting.
Here are few examples of the connect
function I currently have an a PR I have yet to merge
conn = connect(:duckdb)
conn = connect(:mysql; host="localhost", user="root", password="password", db="mydb")
When using MariaDB without a pool of connections, it is often the case that the connection died and the collect results in an error. So a safe way to use it is to connect-collect-disconnect. I do this mainly in REST APIs.
Suppose we have the function
function connect_mariadb(db = "your_database")
con = DBInterface.connect(
MySQL.Connection(...
We can define a macro that substitutes the "con" string with a temporary connection that will be killed after use:
macro db_temp(db::AbstractString, expr)
@eval begin
@info "Conectando..."
local con = connect_mariadb($db)
x = $expr
@info "Desconectando..."
DBInterface.close!(con)
return x
end
end;
macro db_temp(expr)
@eval @db_temp("my_most_common_db", $expr)
end
So if I want to write in mariadb, something like this works fine:
@db_temp write_mariadb(my_data_frame, "table_name", con, comando = "REPLACE")
or some TidierDB pipeline:
dados =
@db_temp @chain begin
DB.db_table(con, :Resumo)
DB.@filter(Dia == "2024-04-10")
DB.@group_by(Operação)
DB.@summarize(Total = sum(Total))
DB.@mutate(Média = mean(Total))
DB.@filter(Média >= 0.1)
DB.@arrange(desc(Operação))
@aside DB.@show_query(_)
DB.@collect
end
I took a look at the pool documentation and dbpool
, I think if you wanted to build something analogous to that and along the lines of what you have been using for mariadb above, it would be fine to bring that into tidierdb, especially if that is something you and others would find valuable in database workflows.
then users will have access to the connect
function as well as this other additional method and they can decide what is best for them.
Hi all,
This package has so much potential! I want to open a rather late discussion about how the interface to different DBs backends could be more "julian" and maybe friendly. I reckon the idea to stick the most to the "tidyverse" style, but I believe this could also lead to better and refinement "tidy" way.
Currently, connections to the different DBs backends work as:
Right?
So what if, instead, there is a higher method that dispatch according to a
backend
argument (or keyword arg):Then we can expand the backend types, but consistently keep one single function name with $n$ backend methods?
Does this make sense, what do you think about this alternative?