Open Yacobolo opened 1 year ago
Ack, sorry for the lag here @Yacobolo, I was on the road and missed this going by. I would like to have a plugin that supported Delta akin to the one I have for Iceberg; I'm assuming it would use the deltalake python package, but I personally don't have access to a Delta lake instance and tbh don't really care enough about learning how to setup a real one to do it myself "for fun."
However, if you (or anyone else!) does have a Delta lake instance and you know it should be configured as a dbt-duckdb plugin, I would most definitely be happy to merge it in.
Hi, @jwills, I would like to try this integration. This would be my first contribution, so I would appreciate some help and guidance at the beginning.
I did a first draft of read plugin integration here
and doing parallel an example project here where i showcase it
Here is the source configuration which loads data as the source with file and projection prunning
What workflow works best for you that you are able to give a feedback?
Hey @milicevica23, thanks so much for taking a crack at this!
The code as-written makes sense to me, but I have to be honest that I don't have a great sense for how folks actually use the deltalake
python module in the real world-- like, do folks really use delta tables w/o a catalog? https://delta-io.github.io/delta-rs/python/usage.html#loading-a-delta-table
The nice thing is that you can but should not use a catalog to know where your table is and i thought to implement support for both ways. Or at least try to do it.. You can think of that as that we add a new file format to external files and not everybody who is on prem or doing simple projects have catalogs. But would be happy to hear feedback from others
Same here, the main use case is not the catalog, but more the metadata it generates together with the ACID transactions and time travel / change history🔥
Alright, super cool. So @milicevica23 if you would put your change together as a PR and other folks on this thread can weigh in on any additional config options we need to support those use cases, that would be great!
Sure, i will open an draft PR.
The things still to do
Be free to add new ideas, topics
I am not used to PR process in the github so feel free to rewrite, do stuff as it fits the needs and best practices
How would https://duckdb.org/2024/06/10/delta.html the new delta kernel work here to simplify and perhaps make the access to delta based data more performant?
A: https://duckdb.org/docs/extensions/delta#supported-duckdb-versions-and-platforms simply adding the extension (if the platform is supported)
Looking forward to the support for delta. This would enable us to run a poor man's data lakehouse! Do you need any help? What is the eta - this year?