antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.12k stars 3.28k forks source link

Python3 runtime bug / missed getCharPositionInLine() funciton in Antlr4? #2764

Open ChameleonRed opened 4 years ago

ChameleonRed commented 4 years ago

Grammar file: https://github.com/antlr/grammars-v4/blob/master/pgn/PGN.g4

Steps:

  1. Download Pycharm + Antlr4 plugin
  2. Set output to src/pgn.
  3. Set language Python3
  4. Generate.
  5. Write program boilerplate (it is included) - it do nothing / clean generated code. (Code is included).
  6. Download some "Chess PGN file".
  7. Run program on it.

Bug: Generated code contains reference to not existing function getCharPositionInLine().

Program:

import os

from antlr4 import FileStream, CommonTokenStream

from pgn.PGNLexer import PGNLexer
from pgn.PGNParser import PGNParser

def main():
    input_stream = FileStream(os.path.join(os.pardir, os.pardir, 'data', 'all.pgn'), encoding='cp1252')
    lexer = PGNLexer(input_stream)
    token_stream = CommonTokenStream(lexer)
    parser = PGNParser(token_stream)
    tree = parser.pgn_game()

if __name__ == '__main__':
    main()

Exception:

C:\root\Python\Python38\python.exe "C:/Users/Cezary Wagner/PycharmProjects/learn_antlr_pgn/src/pgn/__main__.py"
Traceback (most recent call last):
  File "C:/Users/Cezary Wagner/PycharmProjects/learn_antlr_pgn/src/pgn/__main__.py", line 18, in <module>
    main()
  File "C:/Users/Cezary Wagner/PycharmProjects/learn_antlr_pgn/src/pgn/__main__.py", line 14, in main
    tree = parser.pgn_game()
  File "C:\Users\Cezary Wagner\PycharmProjects\learn_antlr_pgn\src\pgn\PGNParser.py", line 266, in pgn_game
    self.enterRule(localctx, 4, self.RULE_pgn_game)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\Parser.py", line 366, in enterRule
    self._ctx.start = self._input.LT(1)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\CommonTokenStream.py", line 61, in LT
    self.lazyInit()
  File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 186, in lazyInit
    self.setup()
  File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 189, in setup
    self.sync(0)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 111, in sync
    fetched = self.fetch(n)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 123, in fetch
    t = self.tokenSource.nextToken()
  File "C:\root\Python\Python38\lib\site-packages\antlr4\Lexer.py", line 128, in nextToken
    ttype = self._interp.match(self._input, self._mode)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 97, in match
    return self.matchATN(input)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 118, in matchATN
    s0_closure = self.computeStartState(input, startState)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 306, in computeStartState
    self.closure(input, c, configs, False, False, False)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 356, in closure
    currentAltReachedAcceptState = self.closure(input, c, configs, currentAltReachedAcceptState, speculative, treatEofAsEpsilon)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 354, in closure
    c = self.getEpsilonTarget(input, config, t, configs, speculative, treatEofAsEpsilon)
  File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 393, in getEpsilonTarget
    if self.evaluatePredicate(input, t.ruleIndex, t.predIndex, speculative):
  File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 455, in evaluatePredicate
    return self.recog.sempred(None, ruleIndex, predIndex)
  File "C:\Users\Cezary Wagner\PycharmProjects\learn_antlr_pgn\src\pgn\PGNLexer.py", line 137, in sempred
    return pred(localctx, predIndex)
  File "C:\Users\Cezary Wagner\PycharmProjects\learn_antlr_pgn\src\pgn\PGNLexer.py", line 143, in ESCAPE_sempred
    return getCharPositionInLine() == 0
NameError: name 'getCharPositionInLine' is not defined

Bug: getCharPositionInLine should be define in generated lexer but is not defined.

ChameleonRed commented 4 years ago

Generated lexer code:

# Generated from C:/Users/Cezary Wagner/PycharmProjects/learn_antlr_pgn/src\PGN.g4 by ANTLR 4.8
from antlr4 import *
from io import StringIO
from typing.io import TextIO
import sys

def serializedATN():
    with StringIO() as buf:
        buf.write("\3\u608b\ua72a\u8133\ub9ed\u417c\u3be7\u7786\u5964\2\27")
        buf.write("\u0097\b\1\4\2\t\2\4\3\t\3\4\4\t\4\4\5\t\5\4\6\t\6\4\7")
        buf.write("\t\7\4\b\t\b\4\t\t\t\4\n\t\n\4\13\t\13\4\f\t\f\4\r\t\r")
        buf.write("\4\16\t\16\4\17\t\17\4\20\t\20\4\21\t\21\4\22\t\22\4\23")
        buf.write("\t\23\4\24\t\24\4\25\t\25\4\26\t\26\3\2\3\2\3\2\3\2\3")
        buf.write("\3\3\3\3\3\3\3\3\4\3\4\3\4\3\4\3\4\3\4\3\4\3\4\3\5\3\5")
        buf.write("\7\5@\n\5\f\5\16\5C\13\5\3\5\3\5\3\6\3\6\7\6I\n\6\f\6")
        buf.write("\16\6L\13\6\3\6\3\6\3\6\3\6\3\7\3\7\3\7\7\7U\n\7\f\7\16")
        buf.write("\7X\13\7\3\7\3\7\3\b\6\b]\n\b\r\b\16\b^\3\b\3\b\3\t\3")
        buf.write("\t\3\t\3\t\3\t\3\t\7\ti\n\t\f\t\16\tl\13\t\3\t\3\t\3\n")
        buf.write("\6\nq\n\n\r\n\16\nr\3\13\3\13\3\f\3\f\3\r\3\r\3\16\3\16")
        buf.write("\3\17\3\17\3\20\3\20\3\21\3\21\3\22\3\22\3\23\3\23\6\23")
        buf.write("\u0087\n\23\r\23\16\23\u0088\3\24\3\24\7\24\u008d\n\24")
        buf.write("\f\24\16\24\u0090\13\24\3\25\3\25\5\25\u0094\n\25\3\26")
        buf.write("\3\26\2\2\27\3\3\5\4\7\5\t\6\13\7\r\b\17\t\21\n\23\13")
        buf.write("\25\f\27\r\31\16\33\17\35\20\37\21!\22#\23%\24\'\25)\26")
        buf.write("+\27\3\2\n\4\2\f\f\17\17\3\2\177\177\5\2\13\f\17\17\"")
        buf.write("\"\4\2$$^^\3\2\62;\5\2\62;C\\c|\n\2%%--//\62<??C\\aac")
        buf.write("|\4\2##AA\2\u00a1\2\3\3\2\2\2\2\5\3\2\2\2\2\7\3\2\2\2")
        buf.write("\2\t\3\2\2\2\2\13\3\2\2\2\2\r\3\2\2\2\2\17\3\2\2\2\2\21")
        buf.write("\3\2\2\2\2\23\3\2\2\2\2\25\3\2\2\2\2\27\3\2\2\2\2\31\3")
        buf.write("\2\2\2\2\33\3\2\2\2\2\35\3\2\2\2\2\37\3\2\2\2\2!\3\2\2")
        buf.write("\2\2#\3\2\2\2\2%\3\2\2\2\2\'\3\2\2\2\2)\3\2\2\2\2+\3\2")
        buf.write("\2\2\3-\3\2\2\2\5\61\3\2\2\2\7\65\3\2\2\2\t=\3\2\2\2\13")
        buf.write("F\3\2\2\2\rQ\3\2\2\2\17\\\3\2\2\2\21b\3\2\2\2\23p\3\2")
        buf.write("\2\2\25t\3\2\2\2\27v\3\2\2\2\31x\3\2\2\2\33z\3\2\2\2\35")
        buf.write("|\3\2\2\2\37~\3\2\2\2!\u0080\3\2\2\2#\u0082\3\2\2\2%\u0084")
        buf.write("\3\2\2\2\'\u008a\3\2\2\2)\u0091\3\2\2\2+\u0095\3\2\2\2")
        buf.write("-.\7\63\2\2./\7/\2\2/\60\7\62\2\2\60\4\3\2\2\2\61\62\7")
        buf.write("\62\2\2\62\63\7/\2\2\63\64\7\63\2\2\64\6\3\2\2\2\65\66")
        buf.write("\7\63\2\2\66\67\7\61\2\2\678\7\64\2\289\7/\2\29:\7\63")
        buf.write("\2\2:;\7\61\2\2;<\7\64\2\2<\b\3\2\2\2=A\7=\2\2>@\n\2\2")
        buf.write("\2?>\3\2\2\2@C\3\2\2\2A?\3\2\2\2AB\3\2\2\2BD\3\2\2\2C")
        buf.write("A\3\2\2\2DE\b\5\2\2E\n\3\2\2\2FJ\7}\2\2GI\n\3\2\2HG\3")
        buf.write("\2\2\2IL\3\2\2\2JH\3\2\2\2JK\3\2\2\2KM\3\2\2\2LJ\3\2\2")
        buf.write("\2MN\7\177\2\2NO\3\2\2\2OP\b\6\2\2P\f\3\2\2\2QR\6\7\2")
        buf.write("\2RV\7\'\2\2SU\n\2\2\2TS\3\2\2\2UX\3\2\2\2VT\3\2\2\2V")
        buf.write("W\3\2\2\2WY\3\2\2\2XV\3\2\2\2YZ\b\7\2\2Z\16\3\2\2\2[]")
        buf.write("\t\4\2\2\\[\3\2\2\2]^\3\2\2\2^\\\3\2\2\2^_\3\2\2\2_`\3")
        buf.write("\2\2\2`a\b\b\2\2a\20\3\2\2\2bj\7$\2\2cd\7^\2\2di\7^\2")
        buf.write("\2ef\7^\2\2fi\7$\2\2gi\n\5\2\2hc\3\2\2\2he\3\2\2\2hg\3")
        buf.write("\2\2\2il\3\2\2\2jh\3\2\2\2jk\3\2\2\2km\3\2\2\2lj\3\2\2")
        buf.write("\2mn\7$\2\2n\22\3\2\2\2oq\t\6\2\2po\3\2\2\2qr\3\2\2\2")
        buf.write("rp\3\2\2\2rs\3\2\2\2s\24\3\2\2\2tu\7\60\2\2u\26\3\2\2")
        buf.write("\2vw\7,\2\2w\30\3\2\2\2xy\7]\2\2y\32\3\2\2\2z{\7_\2\2")
        buf.write("{\34\3\2\2\2|}\7*\2\2}\36\3\2\2\2~\177\7+\2\2\177 \3\2")
        buf.write("\2\2\u0080\u0081\7>\2\2\u0081\"\3\2\2\2\u0082\u0083\7")
        buf.write("@\2\2\u0083$\3\2\2\2\u0084\u0086\7&\2\2\u0085\u0087\t")
        buf.write("\6\2\2\u0086\u0085\3\2\2\2\u0087\u0088\3\2\2\2\u0088\u0086")
        buf.write("\3\2\2\2\u0088\u0089\3\2\2\2\u0089&\3\2\2\2\u008a\u008e")
        buf.write("\t\7\2\2\u008b\u008d\t\b\2\2\u008c\u008b\3\2\2\2\u008d")
        buf.write("\u0090\3\2\2\2\u008e\u008c\3\2\2\2\u008e\u008f\3\2\2\2")
        buf.write("\u008f(\3\2\2\2\u0090\u008e\3\2\2\2\u0091\u0093\t\t\2")
        buf.write("\2\u0092\u0094\t\t\2\2\u0093\u0092\3\2\2\2\u0093\u0094")
        buf.write("\3\2\2\2\u0094*\3\2\2\2\u0095\u0096\13\2\2\2\u0096,\3")
        buf.write("\2\2\2\r\2AJV^hjr\u0088\u008e\u0093\3\b\2\2")
        return buf.getvalue()

class PGNLexer(Lexer):

    atn = ATNDeserializer().deserialize(serializedATN())

    decisionsToDFA = [ DFA(ds, i) for i, ds in enumerate(atn.decisionToState) ]

    WHITE_WINS = 1
    BLACK_WINS = 2
    DRAWN_GAME = 3
    REST_OF_LINE_COMMENT = 4
    BRACE_COMMENT = 5
    ESCAPE = 6
    SPACES = 7
    STRING = 8
    INTEGER = 9
    PERIOD = 10
    ASTERISK = 11
    LEFT_BRACKET = 12
    RIGHT_BRACKET = 13
    LEFT_PARENTHESIS = 14
    RIGHT_PARENTHESIS = 15
    LEFT_ANGLE_BRACKET = 16
    RIGHT_ANGLE_BRACKET = 17
    NUMERIC_ANNOTATION_GLYPH = 18
    SYMBOL = 19
    SUFFIX_ANNOTATION = 20
    UNEXPECTED_CHAR = 21

    channelNames = [ u"DEFAULT_TOKEN_CHANNEL", u"HIDDEN" ]

    modeNames = [ "DEFAULT_MODE" ]

    literalNames = [ "<INVALID>",
            "'1-0'", "'0-1'", "'1/2-1/2'", "'.'", "'*'", "'['", "']'", "'('", 
            "')'", "'<'", "'>'" ]

    symbolicNames = [ "<INVALID>",
            "WHITE_WINS", "BLACK_WINS", "DRAWN_GAME", "REST_OF_LINE_COMMENT", 
            "BRACE_COMMENT", "ESCAPE", "SPACES", "STRING", "INTEGER", "PERIOD", 
            "ASTERISK", "LEFT_BRACKET", "RIGHT_BRACKET", "LEFT_PARENTHESIS", 
            "RIGHT_PARENTHESIS", "LEFT_ANGLE_BRACKET", "RIGHT_ANGLE_BRACKET", 
            "NUMERIC_ANNOTATION_GLYPH", "SYMBOL", "SUFFIX_ANNOTATION", "UNEXPECTED_CHAR" ]

    ruleNames = [ "WHITE_WINS", "BLACK_WINS", "DRAWN_GAME", "REST_OF_LINE_COMMENT", 
                  "BRACE_COMMENT", "ESCAPE", "SPACES", "STRING", "INTEGER", 
                  "PERIOD", "ASTERISK", "LEFT_BRACKET", "RIGHT_BRACKET", 
                  "LEFT_PARENTHESIS", "RIGHT_PARENTHESIS", "LEFT_ANGLE_BRACKET", 
                  "RIGHT_ANGLE_BRACKET", "NUMERIC_ANNOTATION_GLYPH", "SYMBOL", 
                  "SUFFIX_ANNOTATION", "UNEXPECTED_CHAR" ]

    grammarFileName = "PGN.g4"

    def __init__(self, input=None, output:TextIO = sys.stdout):
        super().__init__(input, output)
        self.checkVersion("4.8")
        self._interp = LexerATNSimulator(self, self.atn, self.decisionsToDFA, PredictionContextCache())
        self._actions = None
        self._predicates = None

    def sempred(self, localctx:RuleContext, ruleIndex:int, predIndex:int):
        if self._predicates is None:
            preds = dict()
            preds[5] = self.ESCAPE_sempred
            self._predicates = preds
        pred = self._predicates.get(ruleIndex, None)
        if pred is not None:
            return pred(localctx, predIndex)
        else:
            raise Exception("No registered predicate for:" + str(ruleIndex))

    def ESCAPE_sempred(self, localctx:RuleContext, predIndex:int):
            if predIndex == 0:
                return getCharPositionInLine() == 0
ChameleonRed commented 4 years ago

Here is bug:

    def ESCAPE_sempred(self, localctx:RuleContext, predIndex:int):
            if predIndex == 0:
                return getCharPositionInLine() == 0
ChameleonRed commented 4 years ago

It is probably not bug but bug in grammar ...

ESCAPE
 : {getCharPositionInLine() == 0}? '%' ~[\r\n]* -> skip
 ;
ericvergnaud commented 4 years ago

That non python code is part of the grammar, not part of antlr4 Please close this and get support from the google discussion group

Envoyé de mon iPhone

Le 28 févr. 2020 à 06:07, Cezary K. Wagner notifications@github.com a écrit :

 Grammar file: https://github.com/antlr/grammars-v4/blob/master/pgn/PGN.g4

Steps:

Download Pycharm + Antlr4 plugin Set output to src/pgn. Set language Python3 Generate. Write program boilerplate (it is included) - it do nothing / clean generated code. (Code is included). Download some "Chess PGN file". Run program on it. Bug: Generated code contains reference to not existing function getCharPositionInLine().

Program:

import os

from antlr4 import FileStream, CommonTokenStream

from pgn.PGNLexer import PGNLexer from pgn.PGNParser import PGNParser

def main(): input_stream = FileStream(os.path.join(os.pardir, os.pardir, 'data', 'all.pgn'), encoding='cp1252') lexer = PGNLexer(input_stream) token_stream = CommonTokenStream(lexer) parser = PGNParser(token_stream) tree = parser.pgn_game()

if name == 'main': main() Exception:

C:\root\Python\Python38\python.exe "C:/Users/Cezary Wagner/PycharmProjects/learn_antlr_pgn/src/pgn/main.py" Traceback (most recent call last): File "C:/Users/Cezary Wagner/PycharmProjects/learn_antlr_pgn/src/pgn/main.py", line 18, in main() File "C:/Users/Cezary Wagner/PycharmProjects/learn_antlr_pgn/src/pgn/main.py", line 14, in main tree = parser.pgn_game() File "C:\Users\Cezary Wagner\PycharmProjects\learn_antlr_pgn\src\pgn\PGNParser.py", line 266, in pgn_game self.enterRule(localctx, 4, self.RULE_pgn_game) File "C:\root\Python\Python38\lib\site-packages\antlr4\Parser.py", line 366, in enterRule self._ctx.start = self._input.LT(1) File "C:\root\Python\Python38\lib\site-packages\antlr4\CommonTokenStream.py", line 61, in LT self.lazyInit() File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 186, in lazyInit self.setup() File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 189, in setup self.sync(0) File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 111, in sync fetched = self.fetch(n) File "C:\root\Python\Python38\lib\site-packages\antlr4\BufferedTokenStream.py", line 123, in fetch t = self.tokenSource.nextToken() File "C:\root\Python\Python38\lib\site-packages\antlr4\Lexer.py", line 128, in nextToken ttype = self._interp.match(self._input, self._mode) File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 97, in match return self.matchATN(input) File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 118, in matchATN s0_closure = self.computeStartState(input, startState) File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 306, in computeStartState self.closure(input, c, configs, False, False, False) File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 356, in closure currentAltReachedAcceptState = self.closure(input, c, configs, currentAltReachedAcceptState, speculative, treatEofAsEpsilon) File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 354, in closure c = self.getEpsilonTarget(input, config, t, configs, speculative, treatEofAsEpsilon) File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 393, in getEpsilonTarget if self.evaluatePredicate(input, t.ruleIndex, t.predIndex, speculative): File "C:\root\Python\Python38\lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 455, in evaluatePredicate return self.recog.sempred(None, ruleIndex, predIndex) File "C:\Users\Cezary Wagner\PycharmProjects\learn_antlr_pgn\src\pgn\PGNLexer.py", line 137, in sempred return pred(localctx, predIndex) File "C:\Users\Cezary Wagner\PycharmProjects\learn_antlr_pgn\src\pgn\PGNLexer.py", line 143, in ESCAPE_sempred return getCharPositionInLine() == 0 NameError: name 'getCharPositionInLine' is not defined Bug: getCharPositionInLine should be define in generated lexer but is not defined.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

patriksima commented 4 years ago

Same bug in C#. Antlr4.8

TF-Joynic commented 3 years ago

It is probably not bug but bug in grammar ...

ESCAPE
 : {getCharPositionInLine() == 0}? '%' ~[\r\n]* -> skip
 ;

Encounter the similar error! I guess this is because the plugin just copy the content you write in "{}" block to generated file. Python has the philosophy "Explicit is better than implicit" . "this"(usually we use the word "self") is not permit to ommited in Python OOP programming. You can modify your .g4 file and add self keyword.

...
{self.getCharPositionInLine() == 0}? '%' ~[\r\n]* -> skip
...

It seems that the plugin also has some problem in generating python block indent. For now I just restore the indent manually in generated lexer file.

Hope this helps :)

kaby76 commented 3 years ago

@ChameleonRed This bug is with https://github.com/antlr/grammars-v4/blob/master/pgn/PGN.g4, not with Antlr4. So, this issue should be closed here. We are in the process of testing all grammars against all target languages via the dotnet-antlr tool I wrote, and we have a huge list of grammars that are missing target-specific code, including this grammar. This will take several months to clean up. If you want to be informed when it will be fixed, open a bug over in grammars-v4, and when a PR fixes the bug, we'll inform you of the fixed grammar and support code. --Ken