Closed mrufsvold closed 2 weeks ago
This would be incredible! I wanted to add it but have never established a connection.
Queries and CTEs are stored in structs and a string is built thru the macros and expr_to_sql parsers. A cte counter is used to order CTEs.
The functions needed to support dialect are as follows.
get_metadata
to get the metadata of the table, and then match it to the sq.metadata df stored in the sql struct. expr_to_sql_snowflake()
- this is the parsing function that gets switched by set_sql_mode
. This is what you would need to adapt. Patterns are generally captured using@capture(pattern)
. You'll notice some of them benefit from helper functions (mssql floor_date
for example but most of duckdb does not). So you would take the parsing_dialect.jl document of the most similar dialect, make a copy as your base and make the changes there.
You may need to add minor support in finalize_query
and
collect
and copy_to
and that's basically about it.
Then set_sql_mode
changes the expr_to_sql
in all the macros and it's good to go
And then test it on your w the connection to make sure it works.
In terms of the dependencies, I'm going to work to make them lighter and contingent so will get back to you about that soon
Of course let me know if you have any questions.
Cool. I'll look at Snowflake's Python connector to figure out how they handle collect and copy_to. The rest I feel pretty good about.
Obligatory OSS "I'm very busy and not sure how long this will take" disclaimer, but I'm on the case!
No worries about timing!
As far as copy and collect go:
I only have implemented 3 copy_to
backends (duckdb being the most graceful in enabling a variety), but if that's something that's quite important for your workflow, I'd love it.
Regarding collect
, most of the backends support DBinterface.execute(con, query)
. From what I can tell, it looks like snowflake is used thru the ODBC.jl package, which I think would allow for relatively straightforward implementation of collect
for snowflake, but i will have to double check that.
I'm not super familiar with ODBC. If I'm tracking correctly, the user needs to install the driver themselves and then pass the driver path to ODBC.jl when creating a connection.
That's easy enough to do, but it isn't quite as "batteries included" as the other backends. (If my understanding is correct).
One thought I had was to use ODBC by default. But also create a separate package which lightly wraps the Python snowflake-connector, and a Pkg extension for TidierDB.jl which switches out the Snowflake Mode to use snowflake-connector when it is loaded. This would allow users to manage the dependencies automatically within Julia (at the expense of a pretty long install time).
The Python packages also has nice features like fetch_arrow
which automatically returns Apache Arrow tables so we wouldn't pay to covert to a Julia DataFrame.
Edit: Reading up on The Snowflake REST API. I think that might be the right solution for a dependency free solution. But creating a package to handle that would be non-trivial, so I think ODBC.jl is the way to go in the short term.
So outside of duckdb, the user would load the library in they need for a connection, ie
import TidierDB as DB
using Clickhouse/MySQL, ODBC etc
conn = conn_fucntion("conn_string")
and thats the entirety of what the package is loaded in for by the user. Making a unified connection function is something that definitely might happen down the road to remove the need for the using OtherPkg
part.
Regarding ODBC.jl, there is not much out there for snowflake beyond this (although this could just be someones one off experience) . The idea of using Python for the connector is definitely interesting, although I less familiar with the nuance of leveraging python for a julia package, so my inclination is to use ODBC.jl for now as well. Perhaps @kdpsingh can better comment on that.
As far as dependencies go, the plan is to shift all connection dependencies except DuckDB (MySQL, SQLite, ODBC, LibPQ, and Clickhouse) to become package extensions to lighten up the dependencies load rn.
Thanks @drizk1. I would also say there's no rush to move to package extensions if the current set-up is working.
We can worry about that later. It's nice that TidierDB comes batteries included, and we can explore lightening it as a later priority.
I've used PythonCall.jl before. The experience is very smooth imo, but the install process sets up its own version of python, conda, and then all your python dependencies. This takes a long time.
I think rolling it up in TidierDB would be very costly to the UX, so I think it should wait until you're ready to consider extensions. That will let most people opt out, and for those that use Snowflake, it will feel like a separate snowflake package's fault, not TidierDB 😅
Regarding ODBC.jl, there is not much out there for snowflake beyond this (although this could just be someones one off experience)
I found this. No updates in 4 years, and not indication of their speed experience, but still a good starting place to try it out.
Yeah I would advocate against us including either PythonCall or RCall as a dependency. If the ODBC.jl package isn't sufficient, someone probably needs to make/update the Snowflake-specific Julia package.
Alright so just a quick update here.
I set up a free account w snowflake and spent a surprising amount of time figuring out OAuth token retrieval and permissions.. but finally I made a simple api that lets me query snowflake from Julia with just HTTP.jl and JSON3.jl
I'll hopefully have a branch up in the next day or 2.
My big questions is, in terms of workflow, are OAuth tokens regularly used? Or is there a preferred method (I couldn't get a JWT to work) ?
Howdy @mrufsvold
v.1.9 is on its way to the registry which includes Snowflake Support. I went full batteries included mode wrote an interface for the Snowflake SQL Rest API, thank you for pointing me in that direction. (The pain of driver paths was becoming overwhelming)
I put together some docs on it. But, since I am not a snowflake user, I would love for you to try it out and share your input/thoughts/experienced friction.
I am not sure what is a better way around the Oauth Token besides refreshing it in postman, so I am open to any suggestion/contribution you may have. I think an update connection function that takes the con and a new token if needed to update connection directly in a chain might be a good idea, but I'm not sure if that's something that would actually be valuable in a workflow
Hope it works smoothly for you. Looking forward to your thoughts!
Work's been crazy, so sorry I didn't make progress, but great to hear you've made progress!
Yes, I can give it a spin next week!
No worries at all!! Sorry if I scooped it from you. I just have a nice lull before things get crazy Monday and I wanted to learn something new. No rush at all!
I'm going to close this issue for now since we have support for it, but of course reopen if needed / open new issues for snowflake / anything else.
Hi! I use a lot of Snowflake with my work, and I'd love to contribute a Snowflake Mode to TidierDB.jl. Could someone give me a high level of what needs to be implemented to support a new backend? I'll also read the current code, but it would be awesome to have some context before diving in.