antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

Python 3.12 Peg Parser Grammar has a small problem #3957

Open hasii2011 opened 8 months ago

hasii2011 commented 8 months ago

Pycharm and Python 3.12 in general accept lines in this format

class MyMetaBaseWxCommand(BaseWxCommandMeta, ABCMeta):  # type: ignore
    pass

However, the Python parser generated by the above burps on the '# type: ignore' comment. Says it is a syntax error. Not sure why

kaby76 commented 8 months ago

The grammar is scraped from the pegen grammar 3.12.1. If it doesn't parse, it's caused by one of the following:

Pegen does not take a description for the lexical structure. It's backwards captured from an implementation then placed into the Python documentation--something that is really bad practice. We have to read the source code for the Python3 compiler and find out which.

hasii2011 commented 8 months ago

Ok, thanks for the quick response. Was hoping it was a quick fix. I am not familiar with g4 files, mainly just a consumer.

RobEin commented 8 months ago

The lexical analysis documentation does not mention the generation of the TYPE_COMMENT token. Maybe another documentation describes it somewhere, but I haven't found it so far.

The solution for the comment in the example (# type: ignore) could be that the lexer would recognize it as a plain COMMENT (or perhaps as a hidden TYPE_COMMENT). In other words, the tokenizer would be statement-sensitive. I'm afraid that this cannot be implemented in the lexer and that's probably why they don't write about it in the lexical analysis documentation. I still have to think about that.

Until then, I temporarily set the TYPE_COMMENT tokens to hidden in my own repository. This way, there is no parsing for type comments, but no errors are generated either.

kaby76 commented 8 months ago

In other words, the tokenizer would be statement-sensitive.

We might be able to define an Antlr4 "lexer mode" to work around a parser-state-dependent lex, but "lexer modes" are basically hacks for the real deal. It would be best to have a parser-state aware lexer option for Antlr5. @ericvergnaud

hasii2011 commented 8 months ago

So I generated new lexer/parser files this morning and verified using the modified file provided by @RobEin ; They worked great and got me around this issue with the mypy commented files. This resolves the issue for me now as I don't look at that construct in my visitor code

Thanks very much for this.