delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.15k stars 384 forks source link

Support generating Substrait plans #830

Open wjones127 opened 1 year ago

wjones127 commented 1 year ago

Description

From @roeap:

Generate query plans for operations requiring current table data as substrait (https://substrait.io) plans to integrate with different query backends

Use Case

Related Issue(s)

wjones127 commented 1 year ago

I'm very interested in pursuing this as well, but I haven't looked into it enough to have a clear idea of what API we should offer.

One idea I had was a function DeltaTable.expand_substrait(read_rel: ReadRelation) -> SubstraitPlan, where you would pass a read relation with the projection and filter, and we would give the expanded plans with all the Delta prototol details embedded and files already pruned. But perhaps there's a more straight forward way.

wjones127 commented 1 year ago

Thinking about this a little more, one goal might be to implement Substrait support that will allow us to write an ADBC driver for Delta Lake. That will allow us to support more engines / databases without having to do a bunch of one-off implementations. I'll try to draft a rough design of what we need for ADBC, and then use that to determine what we might need for Substrait integration. I also need to see what the state of these are in the Rust ecosystem.

houqp commented 1 year ago

I think both approaches (read relation & ADBC) are worth pursuing with their own tradeoffs and use-cases.