Eventual-Inc / Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
1.88k stars 119 forks source link

[Catalogs] [Delta Lake] Add support for updating an existing Delta Lake table (overwrite, append, upsert, delete). #1968

Closed clarkzinzow closed 1 day ago

clarkzinzow commented 4 months ago

In addition to creating new Delta Lake tables, we should add support for updating existing Delta Lake tables. We should support overwrites, appends, upserts, and deletes.

sujiplr commented 6 days ago

Are we planning this supports sometimes next 3 months?

samster25 commented 1 day ago

@sujiplr yes we are, once we do our integration with delta-kernel-rs.

cc: @kevinzwang

kevinzwang commented 1 day ago

Hi @sujiplr! Glad to hear that there's demand for these features. We currently have support for overwrite and append with the DataFrame.write_deltalake method, and upserts and deletes are part of our delta lake roadmap (#2457) and we hope to support them soon

kevinzwang commented 1 day ago

In the meantime, if your data can fit on one machine, you can call DataFrame.to_arrow to convert your dataframe into a PyArrow table, and then use it with the deltalake python library

kevinzwang commented 1 day ago

I think I'll also close this issue since each of these features are either done or being tracked by another issue already