apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.23k stars 1.18k forks source link

Add insert/update/delete/(ctas?) to DataFusion planner #4901

Closed avantgardnerio closed 1 year ago

avantgardnerio commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As a library, parts of DataFusion can be very useful, even if they aren't plumbed entirely through to something like the CLI. At my company, we are attempting to build an HTAP database on top of DataFusion, and running into roadblocks because at present DataFusion cannot convert insert/update/delete/ctas queries into LogicalPlans, even though the sqlparser crate can parse them.

Describe the solution you'd like

Parse, plan, but don't execute insert/update/delete queries. (It might be a fun follow on PR to support creating tables with CTAS from the CLI?)

Describe alternatives you've considered

alamb commented 1 year ago

I think this proposal (to have DDL / DML support in the engine) is very much in the spirit of DataFusion as a library to build other databases on.

Specifically, the semantic analysis / basic plan support for these nodes is non trivial (aka resolving references to columns, etc) and is not database specific. The specific implementations of how to actually implement those commands I think are almost certainly going to be system specific.

Ideally, I hope that DataFusion can have a clear separation between

  1. "built in logical plan nodes that have good implementations" (e.g. the query ones like TableSource, Filter, etc) and
  2. "built in logical plan nodes that have either no or basic implementations" (e.g. like create table, write rel, etc)