crate / cratedb-sqlparse

Parsing utilities to validate and split SQL statements for CrateDB.
Apache License 2.0
5 stars 1 forks source link

Missing closing bracket leads to crash of js sql-parse #66

Closed proddata closed 3 weeks ago

proddata commented 3 months ago

Example:

CREATE TABLE t01 (
   "x" OBJECT (DYNAMIC),
   "y" OBJECT (DYNAMIC) AS ("z" ARRAY(OBJECT (DYNAMIC))      
   );

Leads to:

file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:85514
      e2.ctx.parser.getTokenStream().getText(
         ^

TypeError: Cannot read properties of null (reading 'ctx')
    at ExceptionCollectorListener.syntaxError (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:85514:10)
    at file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:1647:37
    at Array.map (<anonymous>)
    at Rt.syntaxError (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:1647:22)
    at _SqlBaseParser.notifyErrorListeners (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:3569:39)
    at Ae.reportUnwantedToken (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:3136:10)
    at Ae.sync (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:3109:16)
    at _SqlBaseParser.createStmt (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:67301:30)
    at _SqlBaseParser.statement (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:60659:16)
    at _SqlBaseParser.statements (file:///Users/georg/sqlparsetest/node_modules/@cratedb/cratedb-sqlparse/dist/sqlparse.js:58435:12)
surister commented 3 months ago

Interestingly, in this situation when collecting errors, antrl4 gives us an None e

def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
    error = ParsingException(
        msg=msg,
        offending_token=offendingSymbol,
        e=e, # <--------------------
        query=e.ctx.parser.getTokenStream().getText(e.ctx.start, e.offendingToken.tokenIndex)
    )

    raise error
column = {int} 4
e = {NoneType} None
line = {int} 5
msg = {str} "extraneous input ';' expecting {',', ')'}"
offendingSymbol = {CommonToken} [@41,117:117=';',<301>,5:4]
recognizer = {SqlBaseParser} <cratedb_sqlparse.generated_parser.SqlBaseParser.SqlBaseParser object at 0x7a927b0715b0>
self = {ExceptionErrorListener} <cratedb_sqlparse.parser.ExceptionErrorListener object at 0x7a927b1ed430>

We still have the error message and could just compose the exception ourselves, the only issue is that we currently use the e object to get the query and later match the exception with the original query.

surister commented 1 month ago

Potential fix:


def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
        if e:
            query = recognizer.getTokenStream().getText(e.ctx.start, offendingSymbol.tokenIndex)

        else:
            # If antlr4 doesn't give us an error object, we heuristically create a query, or a piece of it
            # so we increase the chances of it being correctly assigned.
            # It means that theoretically if you input two wrong queries that antlr4 manages
            # to distinguish as two different statements (which is hard already), and both are similar
            # the errors could be matched wrongly. Still pretty rare, and it is very hard to come up with
            # an actual query that does it.

            # The newly generated query will be either the offendingToken + one token to the left
            # or offendingToken + two tokens to the left, if the second is possible it takes precedence.
            min_token_to_check = max(1, offendingSymbol.tokenIndex - 2)
            tokens = recognizer.getTokenStream().tokens[
                     min_token_to_check: offendingSymbol.tokenIndex:offendingSymbol.tokenIndex + 1]
            query = "".join(map(lambda x: x.text, tokens))

        error = ParsingException(
            msg=msg,
            offending_token=offendingSymbol,
            e=e if e else type('NotViableInput', (Exception,), {})(),
            query=query
        )
        self.errors.append(error)