antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

[C++ Grammar] Segmentation fault with correct data. #3997

Closed cppenjoy closed 7 months ago

cppenjoy commented 7 months ago

Abstract: Hello, I wrote a minimalistic CLI for the generated lexer and parser (according to C++ grammar), and created my own visitor, which does absolutely nothing with where the program crashes.

NOTE: I've merged the C++ grammar into a single file, and converted it to CtcLang. I don't specify this in the issue, because it doesn't affect the behavior of the program in any way. I generate a parser using:

antlr4 -long-messages -Dlanguage=Cpp -no-listener -visitor CtcLang.g4 -o ../compiler

ANTLR4 - Version 4.13.1 GCC - gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

Segfault does not always occur, but only in some cases. After spending a few dozen minutes, I identified the most popular cases:

int main()
{
    a; // Segmentation fault
}
namespace someNamespace
{
    void someFunction() {} // Segmentation fault
}

and etc.

I spent time debugging the program and it crashes with segfault for the reason: <error reading variable: Cannot access memory at address 0x10>

In all other cases, the program behaves correctly, so the reason cannot be:

I admit that the mistake may be in me. Maybe I'm incorrectly initializing and launching the visitor.

You can find the Visitor by following the link: visitor.h visitor.cpp

To see how I initialize the visitor, and run it, you can find it by going to: driver.cpp (on line 41)

Link to project: Click Me

For convenience (probably), I'm attaching the full log below:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555628dae in __gnu_cxx::__normal_iterator<antlr4::tree::ParseTree* const*, std::vector<antlr4::tree::ParseTree*, std::allocator<antlr4::tree::ParseTree*> > >::__normal_iterator (this=0x7fffffffcf10, __i=<error reading variable: Cannot access memory at address 0x10>) at /usr/include/c++/11/bits/stl_iterator.h:1028
1028          : _M_current(__i) { }
cppenjoy commented 7 months ago

Update: After studying this issue in more detail, I noticed that if you put ";" at the end of any code block, you will have a segfault.

So, this code also throws segfault:

; // Segfault
kaby76 commented 7 months ago

First, assuming your grammar was correctly combined from the original split cpp grammar from this repo, the grammar parses all the inputs in your comments fine with Visual Studio 2022 C++ on Windows 11.

Second, this repo does not validate Antlr visitors and listeners. If it's crashing there, it's likely because you have a bug in your code, or a bug in the Antlr runtime. But, this repo has nothing to do with listeners and visitors. We only validate a parse, and a parse tree.

Third, you don't give a complete trace back of the crash, all the way back to main(). I cannot tell whether the parse is crashing or your listener or visitor is crashing. This is important to know. You should use a third-party tool to debug memory overwrites, etc (valgrind?).

cppenjoy commented 7 months ago

@kaby76. Thank you for pointing out my mistakes, and thanks for the feedback.

First, assuming your grammar was correctly combined from the original split cpp grammar from this repo, the grammar parses all the inputs in your comments fine with Visual Studio 2022 C++ on Windows 11.

Yes, but it's not about parsing.

Second, this repo does not validate Antlr visitors and listeners

Thank you for your reply, and sorry for my off-topic issue.

Third, you don't give a complete trace back of the crash, all the way back to main()

I'll fix it now

cppenjoy commented 7 months ago

With input:

; // Segfault

Result:

#0  0x0000555555628dae in __gnu_cxx::__normal_iterator<antlr4::tree::ParseTree* const*, std::vector<antlr4::tree::ParseTree*, std::allocator<antlr4::tree::ParseTree*> > >::__normal_iterator (
    this=0x7fffffffcef0, __i=<error reading variable: Cannot access memory at address 0x10>) at /usr/include/c++/11/bits/stl_iterator.h:1028
#1  0x0000555555625286 in std::vector<antlr4::tree::ParseTree*, std::allocator<antlr4::tree::ParseTree*> >::begin (this=0x10) at /usr/include/c++/11/bits/stl_vector.h:821
#2  0x000055555561afc0 in antlr4::ParserRuleContext::getRuleContexts<CtcLangParser::InitDeclaratorContext> (this=0x0) at /home/lofi/ctc-compiler/compiler-runtime/runtime/src/ParserRuleContext.h:121
#3  0x00005555555df388 in CtcLangParser::InitDeclaratorListContext::initDeclarator (this=0x0) at /home/lofi/ctc-compiler/compiler/CtcLangParser.cpp:12054
#4  0x00005555556571cc in ctc::semantic::SemaAnalyzer::visitSimpleDeclaration (this=0x7fffffffd970, ctx=0x55555589a8f0) at /home/lofi/ctc-compiler/compiler/SemanticAnalyzer/SemanticAnalyzer.cpp:296
#5  0x00005555555cea5e in CtcLangParser::SimpleDeclarationContext::accept (this=0x55555589a8f0, visitor=0x7fffffffd970) at /home/lofi/ctc-compiler/compiler/CtcLangParser.cpp:7720
#6  0x000055555564c267 in antlr4::tree::AbstractParseTreeVisitor::visitChildren (this=0x7fffffffd970, node=0x5555558ebc00)
    at /home/lofi/ctc-compiler/compiler-runtime/runtime/src/tree/AbstractParseTreeVisitor.h:50
#7  0x000055555564ddf4 in CtcLangBaseVisitor::visitBlockDeclaration (this=0x7fffffffd970, ctx=0x5555558ebc00) at /home/lofi/ctc-compiler/compiler/CtcLangBaseVisitor.h:267
#8  0x00005555555cd9be in CtcLangParser::BlockDeclarationContext::accept (this=0x5555558ebc00, visitor=0x7fffffffd970) at /home/lofi/ctc-compiler/compiler/CtcLangParser.cpp:7446
#9  0x000055555564c267 in antlr4::tree::AbstractParseTreeVisitor::visitChildren (this=0x7fffffffd970, node=0x5555558965f0)
    at /home/lofi/ctc-compiler/compiler-runtime/runtime/src/tree/AbstractParseTreeVisitor.h:50
#10 0x000055555564dd90 in CtcLangBaseVisitor::visitDeclaration (this=0x7fffffffd970, ctx=0x5555558965f0) at /home/lofi/ctc-compiler/compiler/CtcLangBaseVisitor.h:263
#11 0x00005555555cd1fe in CtcLangParser::DeclarationContext::accept (this=0x5555558965f0, visitor=0x7fffffffd970) at /home/lofi/ctc-compiler/compiler/CtcLangParser.cpp:7292
#12 0x000055555564c267 in antlr4::tree::AbstractParseTreeVisitor::visitChildren (this=0x7fffffffd970, node=0x5555558a00b0)
    at /home/lofi/ctc-compiler/compiler-runtime/runtime/src/tree/AbstractParseTreeVisitor.h:50
#13 0x0000555555658622 in ctc::semantic::SemaAnalyzer::visitDeclarationseq (this=0x7fffffffd970, ctx=0x5555558a00b0) at /home/lofi/ctc-compiler/compiler/SemanticAnalyzer/SemanticAnalyzer.h:123
#14 0x00005555555ccbfe in CtcLangParser::DeclarationseqContext::accept (this=0x5555558a00b0, visitor=0x7fffffffd970) at /home/lofi/ctc-compiler/compiler/CtcLangParser.cpp:7196
#15 0x000055555564c267 in antlr4::tree::AbstractParseTreeVisitor::visitChildren (this=0x7fffffffd970, node=0x555555877220)
    at /home/lofi/ctc-compiler/compiler-runtime/runtime/src/tree/AbstractParseTreeVisitor.h:50
#16 0x000055555564c5bc in CtcLangBaseVisitor::visitTranslationUnit (this=0x7fffffffd970, ctx=0x555555877220) at /home/lofi/ctc-compiler/compiler/CtcLangBaseVisitor.h:19
#17 0x000055555564b9ea in ctc::driver::driver_instance::run (this=0x7fffffffdbe0) at /home/lofi/ctc-compiler/compiler/Driver/CompilerDriver.cpp:42
#18 0x0000555555648461 in handle_argv (argc=2, argv=0x7fffffffdff8) at /home/lofi/ctc-compiler/compiler/main.cpp:163
#19 0x0000555555647ccf in main (argc=2, argv=0x7fffffffdff8) at /home/lofi/ctc-compiler/compiler/main.cpp:63
kaby76 commented 7 months ago

Looks like you should check whether this line works: https://github.com/cppenjoy/ctc-compiler/blob/59d5efbbc2cf55a0cb79e547496d6107e8e8ee2a/compiler/SemanticAnalyzer/SemanticAnalyzer.cpp#L291

For input ; // Segfault, the tree is:

$ trparse in3.txt | trtree
CSharp 0 in3.txt success 0.0442283

( translationUnit
  ( declarationseq
    ( declaration
      ( blockDeclaration
        ( simpleDeclaration
          ( Semi
            (  text:';' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
  ) ) ) ) ) )
  ( Attribute Before Value ' // Segfault'
  )
  ( EOF
    (  text:'' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
) ) )

That's the same tree for the Cpp target. Your visitor supposedly tests initDeclaratorList() as null, but it doesn't seem to work. I'd remove all your binaries, and rebuild, just to make sure there's no version skew. I would make sure to redo the antlr4 tool. https://github.com/cppenjoy/ctc-compiler/blob/59d5efbbc2cf55a0cb79e547496d6107e8e8ee2a/compiler-front/build_frontend.sh . Also, you really should not be using a manual run of the Antlr tool. You should use a cmake that invokes the antlr tool automatically.

cppenjoy commented 7 months ago

Thank you so much!