Closed djouallah closed 2 weeks ago
I think Delta rust is using Datafusion internally
There's three senses in which we integrate with DataFusion:
It's only the third one that applies to this library.
I could not find any documentation though how to use Delta table with Python datafusion
Our integration with the Python DataFusion is similar to DuckDB: create a PyArrow dataset, import that into DataFusion, and query as desired.
from datafusion import SessionContext
from deltalake import DeltaTable
# Create a DataFusion context
ctx = SessionContext()
delta_table = DeltaTable("path/to/your/table")
ctx.register_dataset(delta_table.to_pyarrow_dataset(), table_name="my_table")
df = ctx.sql("SELECT * FROM my_table")
I see, I think it was a wishful thinking from my side and imagined somehow datafusion using delta table as a native storage with a full integration, I see that's not the case :(
Yeah to integrate like that we'd have to bundle the compiled delta-rs code within the datafusion-python wheels, which would make them quite large.
@wjones127 so what you are saying basically, it is up to datafusion to bundle the delta-rs if they are interested ?
I think Delta rust is using Datafusion internally, I could not find any documentation though how to use Delta table with Python datafusion