databrickslabs / remorph

Cross-compiler and Data Reconciler into Databricks Lakehouse
Other
37 stars 23 forks source link

Improve TSQL and Snowflake parser and lexer #757

Closed jimidle closed 2 months ago

jimidle commented 2 months ago

Here, we correct a number of problems with the Snowflake lexer, which was unnecessarily trying to make sense of things like escape sequences, which are not meant to be verified in the lexer.

We also correct parsing of various options in Snowflake, which were all spelled out as lexer tokens, when they are just simple strings. This simplifies both lexer and parser rules.

Snowflake statements can accept ? as placeholder for a future expression bind. The Snowflake lexer and parser now accept this as a PARAM placeholder in any expression. This seems to be most useful with the IDENTIFIER(?) pseudo function, which protects against SQL injection attacks.

The TSQL lexer has a few small improvements, mainly such that we use the same name in Snowflake and TSQL for the lexer's catch-all rule, when an input character matches no sequence specified in the lexer.

Fixes: #741

github-actions[bot] commented 2 months ago

Coverage tests results

401 tests  ±0   109 :white_check_mark: ±0   4s :stopwatch: ±0s   2 suites ±0     0 :zzz: ±0    2 files   ±0   292 :x: ±0 

For more details on these failures, see this check.

Results for commit c064f5f3. ± Comparison against base commit fed469ed.

:recycle: This comment has been updated with latest results.