duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
816 stars 69 forks source link

Create a simple plugin system for writing data to external destinations #173

Closed jwills closed 1 year ago

jwills commented 1 year ago

Fixes #143

This implements a write-side plugin system that can run after a relation is materialized (either in DuckDB or externally) and can be used to take some action on the resulting data. I'm starting out by porting the Glue database stuff (which can persist an output parquet file as a relation in the AWS Glue catalog) to guide the implementation of the plugins to act as a dual to the source-side plugins.

One rub here is that I don't have a good functional test of the glue stuff, so I'm going to create one in a separate PR that runs against the existing glue impl, merge it, and then port the impl over to confirm that it will work against my new impl here (and likely fix the bugs I find in the process.)

jwills commented 1 year ago

Going to merge this and then start in on revamping the source-side plugins to move them into dbt seed

JCZuurmond commented 1 year ago

Awsome work @jwills! Do have or know of any examples of this functionality?

jwills commented 1 year ago

Hey @JCZuurmond, thanks so much! So I rewrote the existing glue stuff to use the new plugin framework to do its thing (though it’s backwards compatible with the old way of doing it), but I haven’t created any new ones yet— very much open to suggestion here!

maybe sqlalchemy? I still don’t feel like I grok delta-rs to understand how folks actually use it for stuff like this. Pyiceberg doesn’t support general writes yet afaict. What do you think?

JCZuurmond commented 1 year ago

sqlachemy would be nice!

I want to start with a reference use case, maybe as a blog post or talk, to show the Excel capabilities and to inspire other plugin ideas.