databrickslabs / ucx

Your best companion for upgrading to Unity Catalog. UCX will guide you, the Databricks customer, through the process of upgrading your account, groups, workspaces, jobs etc. to Unity Catalog.
Other
196 stars 70 forks source link

[FEATURE]: Replace `sqlglot` with ANTLR parser #2098

Open nfx opened 1 week ago

nfx commented 1 week ago

Is there an existing issue for this?

Problem statement

sqlglot is a hand-written generic parser that does not cover the entirety of Databricks SQL dialect. See example noise from linter verification. sqlglot does not propagate token information for expressions and has no roadmap to do so, which means that we lose formatting of SQL queries when migrating their code to UC. For views it's not an issue, but for code - it is.

Proposed Solution

Additional Context

No response

ericvergnaud commented 1 week ago

As discussed, assuming the DB SQL ANTLR only requires minimal work to be used in Python, 2 PW should be enough for replacing sqlglot. That doesn't account for additional features not yet implemented because not feasible using sqlglot.