DataJunction / dj

A metrics platform.
http://datajunction.io
MIT License
29 stars 13 forks source link

Support Spark SQL Hints #1032

Closed shangyian closed 3 weeks ago

shangyian commented 4 weeks ago

Summary

This change adds support for parsing Spark SQL hints using the ANTLR parser and our custom SQL AST.

The following changes were made:

  1. Replaced the Spark ANTLR grammars with the latest versions:
  2. Regenerated the ANTLR parser based on the modified grammar with antlr4 -Dlanguage=Python3 -visitor SqlBaseLexer.g4 SqlBaseParser.g4 -o generated.
  3. Modified the generated grammar to support hints. There were a few issues with hints previously:

    • The isHint() method was entirely in Java, so I converted it manually to Python-compatible code.
    • Java has a char type which can be used to compare ascii values of characters with integers, but this comparison needs to be done explicitly in Python with chr(...)
    • The final isHint() method looks like this:

      """
      This method will be called when we see '/*' and try to match it as a bracketed comment.
      If the next character is '+', it should be parsed as hint later, and we cannot match
      it as a bracketed comment.
      
      Returns true if the next character is '+'.
      """
      def isHint(self) -> bool:
          nextChar = self._input.LA(1)
          if chr(nextChar) == '+':
              return True
          else:
              return False
  4. Added a Hint tree node to the AST to store parsed hint statements. Also added a list of hints to SelectExpression.

Test Plan

Added some tests for various Spark hints pulled from https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html

Deployment Plan

N/A

netlify[bot] commented 4 weeks ago

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
Latest commit 59ad000314d6a9c97e16a4b32e4adee4a1bfb4af
Latest deploy log https://app.netlify.com/sites/thriving-cassata-78ae72/deploys/665d6fe61c625d000895e5b9