apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.79k stars 1.09k forks source link

Generate well-indented SQL from LogicalPlan #11308

Open edmondop opened 1 month ago

edmondop commented 1 month ago

Is your feature request related to a problem or challenge?

DataFusion provides the capability of "unparsing" a logical plan into SQL via the unparser module in the sql crate (see https://github.com/apache/datafusion/blob/main/datafusion/sql/src/unparser/plan.rs). Examples also showcase the Dialect features to provide customizable escaping.

As a part of the work on SpiceAI and datafusion-federation, we have some rewrites on the tpch_q13 and we expect the final rewritten SQL to be the following:

SELECT c_orders.c_count,
       Count(1) AS custdist
FROM   (SELECT c_custkey                       AS c_custkey,
               "count(tpch.orders.o_orderkey)" AS c_count
        FROM   (SELECT TPCH.customer.c_custkey,
                       Count(TPCH.orders.o_orderkey) AS
                       "COUNT(tpch.orders.o_orderkey)"
                FROM   TPCH.customer
                       LEFT JOIN TPCH.orders
                              ON ( ( TPCH.customer.c_custkey =
                                     TPCH.orders.o_custkey )
                                   AND TPCH.orders.o_comment NOT LIKE
                                       '%special%requests%' )
                GROUP  BY TPCH.customer.c_custkey)) AS c_orders
GROUP  BY c_orders.c_count
ORDER  BY custdist DESC NULLS FIRST,
          c_orders.c_count DESC NULLS FIRST 

however the plan_to_sql generates a one-line sql which is much harder to read

SELECT c_orders.c_count, COUNT(1) AS custdist FROM (SELECT c_custkey AS c_custkey, "COUNT(tpch.orders.o_orderkey)" AS c_count FROM (SELECT tpch.customer.c_custkey, COUNT(tpch.orders.o_orderkey) AS "COUNT(tpch.orders.o_orderkey)" FROM tpch.customer LEFT JOIN tpch.orders ON ((tpch.customer.c_custkey = tpch.orders.o_custkey) AND tpch.orders.o_comment NOT LIKE '%special%requests%') GROUP BY tpch.customer.c_custkey)) AS c_orders GROUP BY c_orders.c_count ORDER BY custdist DESC NULLS FIRST, c_orders.c_count DESC NULLS FIRST"

Describe the solution you'd like

I would like to be able to provide an extra parameter to the Unparser, such as pretty_print or indent, which needs to be respected in the plan_to_sql

Describe alternatives you've considered

Use https://github.com/dprint/dprint on the SQL, but unfortunately SQL is not supported

Additional context

I used this https://www.dpriver.com/pp/sqlformat.htm to generate the formatted SQL from the SQL generated from dataufusion

edmondop commented 1 month ago

Waiting https://github.com/sqlparser-rs/sqlparser-rs/issues/1294 to happen as it is likely that sql-parser will be migrated to datafusion and the change will be easier