duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
880 stars 78 forks source link

refactoring-external-materialization #332

Open milicevica23 opened 7 months ago

milicevica23 commented 7 months ago

Hi @jwills, I wanted to open a pull request to ask questions and potentially propose the refactoring.

I will try in the next few days to list all the requirements and showcase all the implementation of the current external materialization here: https://github.com/l-mds/demo-dbt-duckdb-external-materialization

I would also like to go through that list and implement the missing tests before I start to refactor to ensure that the API stays the same I would also have some questions and need support there.

jwills commented 7 months ago

Ah these are all good questions, will inline some comments shortly

milicevica23 commented 7 months ago

Hi @jwills, Thank you. I will try to add my comments and further questions and, in the next few days, make a first poc of how I think this should work. Please be critical

The main goal, for now, is to showcase the code structure, then we can go into details.

To simplify a bit, I will move out from the scope for now:

milicevica23 commented 7 months ago

So, I added the structure and will start with the simple implementation of the native plugin, which will hopefully show if this makes sense.

This is still a work in progress, but If you have some comments or questions I would be happy to answer them.

I have an general question. Is there some concept of the release candidate / nightly that people can try this before they start to use it :D i am not sure if i can get all 1-to-1 api in the first run

jwills commented 7 months ago

So, I added the structure and will start with the simple implementation of the native plugin, which will hopefully show if this makes sense.

This is still a work in progress, but If you have some comments or questions I would be happy to answer them.

I have an general question. Is there some concept of the release candidate / nightly that people can try this before they start to use it :D i am not sure if i can get all 1-to-1 api in the first run

There isn't a nightly or some such thing as of yet; if we add it, we will add it as a net-new thing that is disabled by default (which is how we've done a bunch of other things, like e.g. the fast CSV loading for dbt seed)

milicevica23 commented 7 months ago

Hi @jwills, can you take a look. I did the first draft of the native functionality and a new test which should cover some of the general cases

milicevica23 commented 7 months ago

Hi @jwills, I think I finished the first round of refactoring, so maybe you could take a look and see if this makes sense to you. What I did:

The new code flow provides a nice feature that the exported file can be pointed directly from the target destination. It sounds trivial for native format but it is especially important for the further implementations of the iceberg, delta, (big query, snowflake ?) adapters, which can now be implemented within external materializations. It also provides a very nice convenient way to register df in the upstream macro

What is there still to do: missing:

has to do:

nice to have:

I hope this gives you a bit overview of what i did, i am happy to answer all the questions

I still fight with the mypy and formater in the pre-commit if you have some suggestions how to go about this i would appreciate it a lot

jwills commented 7 months ago

@milicevica23 ah that is very cool, thanks for the update! I have some stuff going on this week but I will dive into this on Thursday!

jwills commented 7 months ago

@milicevica23 when you get a chance could you redo the formatting according to the pre-commit run --all-files setup? All of the formatting diffs make this hard to review