dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask
https://dask-sql.readthedocs.io/
MIT License
387 stars 71 forks source link

[DF] Use DataFusion-Substrait if configured #500

Open jdye64 opened 2 years ago

jdye64 commented 2 years ago

Is your feature request related to a problem? Please describe. substrait-io is a project that aims to offer cross-language serialization for relational algebra. DataFusion has a producer and consumer library for substrait that would all for dask-sql to use have query plans generated by a configurable backend while presenting dask-sql itself with the same data structures.

Describe the solution you'd like A configuration should be added to dask-sql that allows for either the DataFusion library to be directly used for query parsing and logical planning or the use of DataFusion-Substrait which would use substrait and its configured parser/planner for generating the logical plan.

With either route the response from the parsing logic should be a valid DataFusion LogicalPlan instance that should prevent the remainder of the code base from needing changes.

Describe alternatives you've considered None

Additional context None

jdye64 commented 2 years ago

While the PR itself is ready the upstream substrait bindings for Rust are not, currently only Java exists but plans for the Rust bindings are in the works.