TidierOrg / TidierDB.jl

Tidier database analysis in Julia, modeled after the dbplyr R package.
MIT License
42 stars 3 forks source link

Using package extensions instead of requiring all backends #45

Closed giordano closed 1 month ago

giordano commented 1 month ago

This package has loads of dependencies, on many database backend packages. Would it be possible to switch to using package extensions instead? You'd probably need to slightly change the API though, to introduce new types to dispatch the functions for the different backends.

drizk1 commented 1 month ago

I would love to move to package extensions. I've read about them and watched a video, but I haven't had the time and depth of understanding to do it yet.

I think adjusting the API would be relatively straightforward w @collect and connect being the two that pieces that need to be broken down and portioned off into extensions. All of the the get_table_metadata functions are already specific to their backend so those would be easy to just move into an extension.

drizk1 commented 1 month ago

Quick update,

I've figured out how extensions work and have been able to make separate ones for Postgres, SQLite, Athena, GBQ, MySQL, MsSQL, Clickhouse. This had reduced dependencies from 16 to 9.

It seems using the underlying sql mode is sufficient to allow for different collect dispatches.

I think for now, I will plan to leave databricks and snowflake in main module because it simplifies the collecting asepect a bit and maintains a little more flexibility for using multiple backends in one session.

The only only friction point I see might be collecting from certain backend combinations in the same session like gbq and then aws for example.. but that seems unlikely.