antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.03k stars 3.27k forks source link

Python parser code incorrect on C.g4 (no nextToken in tokenSource?) #4194

Open Code7R opened 1 year ago

Code7R commented 1 year ago

Looks like there is some inconsistency in the Python backend - there is no nextToken method in the token source but yet the Parser seems to be generated like Java version?!

Target backend: python 3.11 (Debian Unstable), however I saw the same behavior on Windows too (also Python 3.11).

Used grammar: https://github.com/antlr/grammars-v4/blob/master/c/C.g4

Generated with: ~/.local/bin/antlr4 -Dlanguage=Python3 C.g4

Program code: that is basically by the book, see below. Any valid C header as parameter.

Exception stack:

File "/home/user/IdeaProjects/xxx/cparser/headerparser.py", line 32, in main(sys.argv) File "/home/user/IdeaProjects/xxx/cparser/headerparser.py", line 26, in main tree = parser.primaryExpression() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/IdeaProjects/xxx/cparser/CParser.py", line 813, in primaryExpression self.enterRule(localctx, 0, self.RULE_primaryExpression) File "/home/user/.local/lib/python3.11/site-packages/antlr4/Parser.py", line 374, in enterRule self._ctx.start = self._input.LT(1) ^^^^^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/antlr4/CommonTokenStream.py", line 62, in LT self.lazyInit() File "/home/user/.local/lib/python3.11/site-packages/antlr4/BufferedTokenStream.py", line 187, in lazyInit self.setup() File "/home/user/.local/lib/python3.11/site-packages/antlr4/BufferedTokenStream.py", line 190, in setup self.sync(0) File "/home/user/.local/lib/python3.11/site-packages/antlr4/BufferedTokenStream.py", line 112, in sync fetched = self.fetch(n) ^^^^^^^^^^^^^ File "/home/user/.local/lib/python3.11/site-packages/antlr4/BufferedTokenStream.py", line 124, in fetch t = self.tokenSource.nextToken() ^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'CParser' object has no attribute 'nextToken' python-BaseException

#!/usr/bin/env python3

import sys
from antlr4 import *
from CParser import CParser
from CListener import CListener

class HeaderAnalyzerListener(CListener):
    def enterKey(self, ctx):
        pass
    def exitKey(self, ctx):
        pass
    def enterValue(self, ctx):
        pass
    def exitValue(self, ctx):
        pass

def main(argv):
    input_stream = FileStream(argv[1])
    lexer = CParser(input_stream)
    stream = CommonTokenStream(lexer)
    parser = CParser(stream)
    parser.buildParseTrees = True
    tree = parser.primaryExpression()
    walker = ParseTreeWalker()
    walker.walk(HeaderAnalyzerListener, tree)

if __name__ == '__main__':
    main(sys.argv)
ericvergnaud commented 1 year ago

This line: lexer = CParser(input_stream) should read: lexer = CLexer(input_stream)

kaby76 commented 1 year ago

Why do people keep writing drivers with parser.buildParseTrees = True when it's the default, even across all targets? https://github.com/antlr/antlr4/blob/8188dc5388dfe9246deb9b6ae507c3693fd55c3f/runtime/Python3/src/antlr4/Parser.py#L75

Code7R commented 1 year ago

Yeah, okay, that happens if you copy-paste a bit in rush when adapting the example code.

Maybe you could add a simple type check there? Throwing a TypeError exception when the wrong thing was passed over, after an isinstanceof-check or similar. (Maybe with something funny on top, a-la "418 I'm a teapot").

parser.buildParseTrees = True might be not needed but that happens when you are struggling with a lack of information to understand.

kaby76 commented 1 year ago

You might consider an application generator for the grammar. The grammars-v4 repo uses the one I've been writing for several years now, trgen. It takes an Antlr grammar (the .g4's plus a desc.xml to say what targets are supported), and generates a complete command-line program for any target and OS supported. There are others around, but I don't think as good. To use it, install the Trash toolkit, clone the grammar-v4 repo, cd to c/, then type "trgen" and it will create a parser application for all targets.