amykyta3 / speedy-antlr-tool

Generate an accelerator extension that makes your Antlr parser in Python super-fast!
BSD 3-Clause "New" or "Revised" License
29 stars 7 forks source link

Error in sa_*_translator.cpp(536): error C2039 when using ANTLR labled rules #10

Closed m-zakeri closed 2 years ago

m-zakeri commented 2 years ago

Hi, I get error C2039 due to access to a member of a subclass, *Context, from an instance of its superclass, Context. For example, in the following code, the compiler error is: 'blockLabel': is not a member of 'JavaParser::StatementContext' Indeed ctx object doesn't access to blockLable which is defined in one of the subclasses of StatementContext,

antlrcpp::Any SA_JavaTranslator::visitStatement(JavaParser::StatementContext *ctx){

    speedy_antlr::LabelMap labels[] = {
        {"blockLabel", static_cast<void*>(ctx->blockLabel)},
        {"statementExpression", static_cast<void*>(ctx->statementExpression)},
        {"identifierLabel", static_cast<void*>(ctx->identifierLabel)}
    };
    if(!StatementContext_cls) StatementContext_cls = PyObject_GetAttrString(translator->parser_cls, "StatementContext");
    PyObject *py_ctx = translator->convert_ctx(this, ctx, StatementContext_cls, labels, 3);
    return py_ctx;

blockLable is defined in Statement0Context, and statementExpression is defined in another subclass of StatementContext

class  Statement0Context : public StatementContext {
  public:
    Statement0Context(StatementContext *ctx);

    JavaParser::BlockContext *blockLabel = nullptr;
    BlockContext *block();
    virtual void enterRule(antlr4::tree::ParseTreeListener *listener) override;
    virtual void exitRule(antlr4::tree::ParseTreeListener *listener) override;

    virtual antlrcpp::Any accept(antlr4::tree::ParseTreeVisitor *visitor) override;
  };

I used static and dynamic casting but I got runtime errors. The following code is compiled but makes runtime parsing error:

antlrcpp::Any SA_JavaTranslator::visitStatement(JavaParser::StatementContext *ctx){
    JavaParser::Statement0Context *ctx1 = static_cast<JavaParser::Statement0Context*>(ctx);
    JavaParser::Statement15Context *ctx2 = static_cast<JavaParser::Statement15Context*>(ctx);
    JavaParser::Statement16Context *ctx3 = static_cast<JavaParser::Statement16Context*>(ctx);
    speedy_antlr::LabelMap labels[] = {
        {"blockLabel", static_cast<void*>(ctx1->blockLabel)},
        {"statementExpression", static_cast<void*>(ctx2->statementExpression)},
        {"identifierLabel", static_cast<void*>(ctx3->identifierLabel)}
    };
    if(!StatementContext_cls) StatementContext_cls = PyObject_GetAttrString(translator->parser_cls, "StatementContext");
    PyObject *py_ctx = translator->convert_ctx(this, ctx, StatementContext_cls, labels, 3);
    return py_ctx;
}
m-zakeri commented 2 years ago

The ANTLR rules, .g4 file, for the above code are as follows:

statement
    : blockLabel=block #statement0
    | ASSERT expression (':' expression)? ';' #statement1
    | IF parExpression statement (ELSE statement)? #statement2
    | FOR '(' forControl ')' statement #statement3
    | WHILE parExpression statement #statement4
    | DO statement WHILE parExpression ';' #statement5
    | TRY block (catchClause+ finallyBlock? | finallyBlock) #statement6
    | TRY resourceSpecification block catchClause* finallyBlock? #statement7
    | SWITCH parExpression '{' switchBlockStatementGroup* switchLabel* '}' #statement8
    | SYNCHRONIZED parExpression block #statement9
    | RETURN expression? ';' #statement10
    | THROW expression ';' #statement11
    | BREAK IDENTIFIER? ';' #statement12
    | CONTINUE IDENTIFIER? ';' #statement13
    | SEMI #statement14
    | statementExpression=expression ';' #statement15
    | identifierLabel=IDENTIFIER ':' statement #statement16
    ;
m-zakeri commented 2 years ago

Does anybody have any recommendations regarding this issue?

amykyta3 commented 2 years ago

I tried on a smaller testcase but I couldn't reproduce the issue. Can you confirm the following for me:

amykyta3 commented 2 years ago

I was able to reproduce it on an older version of speedy-antlr-tool (v1.0.0) Please upgrade to v1.2.0 and it should be fixed.

m-zakeri commented 2 years ago

I am also using speedy-antlr-tool (v1.2.0) and ANTLR 4.9.3. I use Python 3.8 on Windows 10 (x64) with the MVC++ compiler from Visual Studio 2019.

I think the problem is due to improper handling of labeled statements in ANTLR grammars (e.g., # xxx) by speedy-antlr-tool (v1.2.0).

I attached my grammars files.

grammars.zip

m-zakeri commented 2 years ago

All generated files by speedy-antlr-tool (v1.2.0) for the above grammars: parser.zip

amykyta3 commented 2 years ago

I think I see what is happening. I was able to re-generate the output with the grammar files you sent and I do not see the issue.

When you invoke Antlr to generate the two versions of the parser, both Python and C++ outputs need to be generated from the exact same g4 files.

It looks like your C++ parser was generated from JavaParserLabeled.g4 you sent me (after you added labels but before you renamed it to add the "Labeled" suffix in the grammar's name). The C++ parser output contains context classes that match the g4 files you sent me (has Statement1Context, Statement2Context, etc..)

I suspect the Python parser was generated from the original JavaParser.g4, before you modified it to add labels. Looking at the output Antlr generated, it is missing the labeled classes (only contains StatementContext).

The speedy-antlr-tool uses the Python output as a reference in order to generate the translator C++. If the Python and C++ targets were not generated from identical grammars, then the translator will have mismatches.

Try re-generating the Python parser target from Antlr using your latest grammar file. I suspect it will work.

For reference, here is the script I used:

#!/bin/bash
cd "$( dirname "${BASH_SOURCE[0]}" )"

antlr4="java -Xmx500M -cp /usr/local/lib/antlr-4.9.2-complete.jar org.antlr.v4.Tool"

# Generate C++ target with visitor
$antlr4 -Dlanguage=Cpp -o cpp_src JavaLexer.g4
$antlr4 -Dlanguage=Cpp -visitor -listener -o cpp_src JavaLabeledParser.g4

# Generate Python target
$antlr4 -Dlanguage=Python3 -o . JavaLexer.g4
$antlr4 -Dlanguage=Python3 -visitor -listener -o . JavaLabeledParser.g4

# Run speedy-antlr-tool to generate parse accelerator
python3 <<EOF
from speedy_antlr_tool import generate

generate(
    py_parser_path="JavaLabeledParser.py",
    cpp_output_dir="cpp_src",
)
EOF
m-zakeri commented 2 years ago

Thanks for your response. Actually, I get the following error with your script:

Traceback (most recent call last):
  File "speedya.py", line 3, in <module>
    generate(
  File "D:\Anaconda3\envs\Py38\lib\site-packages\speedy_antlr_tool\main.py", line 81, in generate
    raise ValueError("File does not look like a parser: %s" % py_parser_path)
ValueError: File does not look like a parser: JavaParserLabeled.py

How your script is run without error (I see that you did not rename the grammar)?

amykyta3 commented 2 years ago

My mistake. I copied the wrong text. Yes I renamed the grammar to JavaLabeledParser for it to work. I have corrected my script above

m-zakeri commented 2 years ago

Thanks for your efforts and time. It now works fine for me. I also had to change the name of JavaLexer.g4 to JavaLabledLexer.g4. Is there any reason for such a strict naming convention in grammar's names (e.g., ending the file name with *parser)?

amykyta3 commented 2 years ago

When I originally developed the tool, it only supported grammars where the parser & lexer were in the same file. In that situation, the grammar would be simply named Java.g4, and Antlr would automatically append *Lexer and *Parser to the generated output. It was easier to keep this assumption when adding support for split grammars (#5) which is why the restriction exists. Not a great reason, but that's why it happened. If I'm bored I'll refactor and remove that assumption :smile: