delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.34k stars 413 forks source link

DataFusion write to delta via Python #2422

Open dhruvils414 opened 7 months ago

dhruvils414 commented 7 months ago

Hello

I would like to write into Delta lake via DataFusion just like spark.

Future support

Append, overwrite, Merge into

MrPowers commented 7 months ago

Let's make sure to get the docs page updated too when/if this gets completed: https://delta-io.github.io/delta-rs/integrations/delta-lake-datafusion/

ion-elgreco commented 6 months ago

We already use datafusion in delta-rs, so not sure what you mean?

MrPowers commented 6 months ago

@ion-elgreco - any chance we can expose the syntax to write to a Delta tables with DataFusion in the docs, so it's easy for me to learn how to do it?

ion-elgreco commented 6 months ago

@MrPowers what docs are we talking about here? Because we use datafusion on the rust side, but this is not documented well. All of the writing is just dispatched to rust from pyuthon

dhruvils414 commented 6 months ago

I think it didn't support directly. Based Article 1, We need convert to pyarrow before we read from datafusion.

from datafusion import SessionContext from deltalake import DeltaTable

ctx = SessionContext() table = DeltaTable("G1_1e9_1e2_0_0") ctx.register_dataset("my_delta_table", table.to_pyarrow_dataset())

Article 1 https://delta-io.github.io/delta-rs/integrations/delta-lake-datafusion/#delta-lake-performance-benefits-for-datafusion-users