Open tempbottle opened 6 months ago
Hi @tempbottle -- no sadly there is no way that I know if to go from logical plan --> SQL
I think it would be possible in theory, though probably a non trivial amount of code
A required step would be to go from Expr --> SQL expression, which might be a good place to start
I looked into this myself recently and also concluded that DataFusion has no way currently to recover the SQL strings that generated a LogicalPlan or parts of it (Exprs). Likewise, I don't think sqlparser-rs has any mechanism to go from AST back to raw SQL. Either case would be non trivial to implement, and raises the question of what SQL dialect you want to support.
I'm not sure what your use case is @tempbottle, but in my case I am looking at implementing TableProviders over top of other engines which accept SQL. For this use case, it may be more fruitful to look at https://substrait.io/. DataFusion does have the capability to convert a LogicalPlan to a Substrait plan. If the engine of interest does not consume Substrait, you are probably stuck writing custom glue code to map a LogicalPlan or Exprs into whatever SQL dialect or DSL that the target engine supports.
I'll also point out that GlareDb has some examples of mapping Exprs to SQL strings for the purposes of implementing a TableProvider. See e.g. https://github.com/GlareDB/glaredb/blob/ee4b440d9a594150926ce0a5fd96db400185c881/crates/datasources/src/postgres/mod.rs#L1242
Likewise, I don't think sqlparser-rs has any mechanism to go from AST back to raw SQL.
I am pretty sure the Display
impl in sqlparser-rs recovers SQL from the parsed AST (as we have had many PRs related to keeping it working)
I agree substrait.io is a good idea for connecting other databases.
@tempbottle @devinjdangelo FYI Ibis has a to_sql method that can be used to generate SQL and DataFusion is a backend for Ibis however I'm not sure if it has all the features required for to_sql
to work (the latest i saw was that DataFusion still had a lot of gaps to fill with the Ibis wrapper). It also may not be the API you were looking for, but thought it could help as you could see how they implement it.
I am pretty sure the
Display
impl in sqlparser-rs recovers SQL from the parsed AST (as we have had many PRs related to keeping it working)
That's very interesting i'll have to look more closely at that. Thanks @alamb !
FYI I filed https://github.com/apache/arrow-datafusion/issues/9495 for the Expr --> String feature. That would likely be a necesary first step for a full LogicalPlan --> SQL conversion
I split out datafusion-federation's SQL Writer code into its own package and added an example for expressions and plans. I intend to mature it over time. It's open to contributions and/or moving to a more canonical place.
For anyone else following along, there is substantial discussion on https://github.com/apache/arrow-datafusion/issues/9495#issuecomment-1985057845 worth reading
Does it make sense for me to lay the same 'groundwork' for the LogicalPlan
serialization as I did for the Expr
in #9517?
The suggestion would be similar: Upstream the datafusion-federation sql-write code as follows:
datafusion/sql/src/unparser
.
plan_to_sql
method similar to expr_to_sql
.Any early thoughts or feedback on the datafusion-federation code sql-write code is appreciated, so I can address it in the process.
Does it make sense for me to lay the same 'groundwork' for the LogicalPlan serialization as I did for the Expr in https://github.com/apache/arrow-datafusion/pull/9517?
I am in support of this. We should be cautious about what we expose via public interfaces at first, since I think it is likely we will require many breaking changes to support all possible logical plans. The top level plan_to_sql
function should be stable, but we will need freedom to rework methods below that.
Upstream the AST builders to sqlparser-rs.
I am a bit surprised there isn't an ergonomic way to build an AST already in sqlparser
. I can see that the DFParser constructs Statements directly without a builder struct.
If the AST builder functionality is not particularly useful in sqlparser
, it is likely best to include that in datafusion only rather than upstream to sqlparser
. Do you think the AST builders would be useful upstream @alamb?
Does it make sense for me to lay the same 'groundwork' for the
LogicalPlan
serialization as I did for theExpr
in #9517?
I agree with @devinjdangelo that I think it does.
If the AST builder functionality is not particularly useful in
sqlparser
, it is likely best to include that in datafusion only rather than upstream tosqlparser
. Do you think the AST builders would be useful upstream @alamb?
I am not sure, to be honest. I think if we upstreamed a AST builder style thing in sqlparser, we would also want to migrate the existing parser to use it (and therefore take advantage of the existing tests). I think as long as it were backwards compatible it would make a nice contribution.
However, I would also say there is no compelling need to upstream it either. Having it start life in DataFusion would also be fine and then maybe upstream it in sqlparser
The initial PR (#9596) has landed. For anyone interested in helping to flesh out the LogicalPlan
serialization to SQL, a good place to start are the remaining not_impl_err!
statements in plan.rs. Add an integration test in roundtrip_statement
in sql::tests
and try to make it pass!
Independently I have spent a fair bit of time on this problem. I think this current design of going to the sqlparser AST is a better approach than trying to go to SQL directly.
One thing that really helped was that I went and copied ~150ish queries from https://www.w3resource.com/sql-exercises/adventureworks/adventureworks-exercises.php to do a roundtrip test against. These are licensed https://creativecommons.org/licenses/by-nc-sa/3.0/deed.en so could be added to the test suite.
As you start to add more queries you start to see a lot of edge cases (particularly relating to how aggregations work) that need to be dealt with.
Part of https://github.com/apache/datafusion/issues/9494
Is your feature request related to a problem or challenge?
No response
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Sub tasks