DataJunction / dj

A metrics platform.
http://datajunction.io
MIT License
29 stars 13 forks source link

Support Spark SQL Hints #1033

Closed shangyian closed 3 weeks ago

shangyian commented 3 weeks ago

Summary

This change adds support for parsing Spark SQL hints using the ANTLR parser and our custom SQL AST.

The following changes were made:

  1. Replaced the Spark ANTLR grammars with the latest versions:
  2. Regenerated the ANTLR parser based on the modified grammar with antlr4 -Dlanguage=Python3 -visitor SqlBaseLexer.g4 SqlBaseParser.g4 -o generated.
  3. Modified the generated lexer to support hints. There were a few issues with hints previously:
    • The isHint() method was entirely in Java, so I converted it manually to Python.
    • Java has a char type which can be used to compare ascii values of characters with integers, but this comparison needs to be done explicitly in Python with chr(...)
  4. Modified the AST to store parsed hint statements by adding a Hint tree node type and a list of hints to SelectExpression.

Test Plan

Added some tests for various Spark hints pulled from https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html

Deployment Plan

N/A

netlify[bot] commented 3 weeks ago

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
Latest commit 20bb573ba02a375a7bdc67d3cd24e85ab8b1c248
Latest deploy log https://app.netlify.com/sites/thriving-cassata-78ae72/deploys/665e00829100550008509155