PabloRMira / sql_formatter

A Python based SQL formatter
https://pablormira.github.io/sql_formatter/
Apache License 2.0
43 stars 10 forks source link

Formatting of CTEs #163

Open sambrilleman opened 3 years ago

sambrilleman commented 3 years ago

Is your feature request related to a problem? Please describe

Thanks for this package! Is it possible to have the start of a CTE definition on a new line?

The current behaviour is that a CTE is a continuation of the same line, as in the following example:

print(
    sql_formatter.core.format_sql("""
    WITH

    first_cte AS (
        SELECT var 
        FROM table1
    ),

    second_cte AS (
        SELECT var 
        FROM table2
    )

    SELECT a.var
    FROM first_cte
    JOIN second_cte USING (var)
    """)
)

which leads to the following output:

with first_cte as (SELECT var
                   FROM   table1), second_cte as (SELECT var
                               FROM   table2)
SELECT a.var
FROM   first_cte join second_cte using (var)

Describe the solution you'd like

The desired output would be:

WITH

first_cte as (
    SELECT var
    FROM   table1
), 

second_cte as (
    SELECT var
    FROM   table2
)

SELECT a.var
FROM   first_cte join second_cte using (var)

That is:

Describe alternatives you've considered

ztane commented 2 years ago
    ...
    CROSS JOIN LATERAL ( WITH bar AS (
            SELECT 
                a,
                b,
            FROM foo_func(baz)),
            barf AS (
                SELECT ...
)
           SELECT ...
    ...

i.e. not only is the WITH hanging there on the right, but also the closing parenthesis of the 2nd CTE placed in column 1, even though contained within other parentheses and thoroughly indented otherwise...

zachmayer commented 2 years ago

I would also like to see this improved! The current version of the package makes my WITH statements unreadable

ryanhaarmann commented 1 year ago

I'd also like to see WITH formatting added

MarcSkovMadsen commented 3 weeks ago

Thanks so much for creating this tool.

I have a CTE that looks like

set variable notationtime_start = '2024-11-04'; set variable notationtime_end = '2024-11-07'; set variable time_start = '2025-01-01 00:00:00'; set variable time_end = '2026-01-01 00:00:00'; set variable variable = 'mean'; set variable key_parent = 'portfolio_calc.all.cashflow_q';
with filter_data AS (
    SELECT ts.model, ts.notationtime, ts.job_id, m.metric, ts.key, date_part('year', time) as year, ts.value, 
    FROM   metrics m
        INNER JOIN timeseries_results_view ts
            ON m.freetext = ts.freetext and
               m.key = ts.key
    WHERE
            variable = getvariable('variable')
            and notationtime >= getvariable('notationtime_start')
            and notationtime <= getvariable('notationtime_end')
            and time >= getvariable('time_start')
            and time < getvariable('time_end')
    ORDER BY ts.freetext, m.metric desc, ts.key limit 10000
),
pivot_table AS (
    PIVOT filter_data ON metric USING sum(value) as value 
)

select * from pivot_table

and after formatting looks like

set variable notationtime_start = '2024-11-04'; set variable notationtime_end = '2024-11-07'; set variable time_start = '2025-01-01 00:00:00'; set variable time_end = '2026-01-01 00:00:00'; set variable variable = 'mean'; set variable key_parent = 'portfolio_calc.all.cashflow_q'; with filter_data as (SELECT ts.model,
                                                                                                                                                                                                                                                                                                                     ts.notationtime,
                                                                                                                                                                                                                                                                                                                     ts.job_id,
                                                                                                                                                                                                                                                                                                                     m.metric,
                                                                                                                                                                                                                                                                                                                     ts.key,
                                                                                                                                                                                                                                                                                                                     date_part('year', time) as year,
                                                                                                                                                                                                                                                                                                                     ts.value
                                                                                                                                                                                                                                                                                                              FROM   metrics m
                                                                                                                                                                                                                                                                                                                  INNER JOIN timeseries_results_view ts
                                                                                                                                                                                                                                                                                                                      ON m.freetext = ts.freetext and
                                                                                                                                                                                                                                                                                                                         m.key = ts.key
                                                                                                                                                                                                                                                                                                              WHERE  variable = getvariable('variable')
                                                                                                                                                                                                                                                                                                                 and notationtime >= getvariable('notationtime_start')
                                                                                                                                                                                                                                                                                                                 and notationtime <= getvariable('notationtime_end')
                                                                                                                                                                                                                                                                                                                 and time >= getvariable('time_start')
                                                                                                                                                                                                                                                                                                                 and time < getvariable('time_end')
                                                                                                                                                                                                                                                                                                              ORDER BY ts.freetext, m.metric desc, ts.key limit 10000), pivot_table as (pivot filter_data
        ON metric using sum(value) as value)
SELECT *
FROM   pivot_table

This makes the CTE highly unreadable.