databrickslabs / remorph

Cross-compiler and Data Reconciler into Databricks Lakehouse
Other
35 stars 21 forks source link

[FEATURE]: Improve `group_by_clause` rule in Snowflake grammar #239

Open vil1 opened 6 months ago

vil1 commented 6 months ago

Is there an existing issue for this?

Problem statement

The way the goup_by_clause is currently expressed is such that some group types, such as ROLLUP in GROUP BY ROLLUP (x) are incorrectly parsed as function calls.

More specifically, the second branch in:

group_by_clause
    : GROUP BY group_by_list having_clause?
    | GROUP BY (CUBE | GROUPING SETS | ROLLUP) LR_BRACKET group_by_list RR_BRACKET
    | GROUP BY ALL
    ;

Seems to never be matched as the ROLLUP (x) part of the input always gets matched by the goup_by_list rule

Proposed Solution

Reordering the branches in group_by_clause seems to fix the issue:

group_by_clause
    : GROUP BY (CUBE | GROUPING SETS | ROLLUP) LR_BRACKET group_by_list RR_BRACKET
    | GROUP BY group_by_list having_clause?
    | GROUP BY ALL
    ;

Additional Context

No response

vil1 commented 6 months ago

@nfx: is it OK for me to do the change I suggested right away?

jimidle commented 3 months ago

@vil1 Given that we have changed this area quite a bit, is this issue still valid? The Snowflake grammar could still be improved quite a bit with function declarations etc. Maybe I should spend some time on Snowflake grammar @nfx ?