antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.27k stars 3.29k forks source link

Segmentation fault on C++ runtime in SingletonPredictionContext #4543

Closed Sustrak closed 9 months ago

Sustrak commented 9 months ago

With this grammar:

grammar MemMap;

memmap: region+ EOF;
region: (WORD | '_')+  WORD*;

WORD: [A-Za-z]+;

I get a SEGFAULT in: antlr4::atn::PredictionContext::getContextType PredictionContext.h:170. The stack trace is the follow:

antlr4::atn::PredictionContext::getContextType PredictionContext.h:170
antlr4::atn::SingletonPredictionContext::equals SingletonPredictionContext.cpp:60
antlr4::atn::operator== PredictionContext.h:206
antlr4::atn::PredictionContextCache::PredictionContextComparer::operator() PredictionContextCache.cpp:55
std::__detail::_Equal_helper::_S_equals hashtable_policy.h:1460
std::__detail::_Hashtable_base::_M_equals hashtable_policy.h:1844
std::_Hashtable::_M_find_before_node hashtable.h:1562
std::_Hashtable::_M_find_node hashtable.h:649
std::_Hashtable::find hashtable.h:1452
std::unordered_set::find unordered_set.h:654
antlr4::atn::PredictionContextCache::get PredictionContextCache.cpp:40
getCachedContextImpl PredictionContext.cpp:53
antlr4::atn::PredictionContext::getCachedContext PredictionContext.cpp:520
antlr4::atn::ATNSimulator::getCachedContext ATNSimulator.cpp:32
antlr4::atn::ATNConfigSet::optimizeConfigs ATNConfigSet.cpp:138
antlr4::atn::ParserATNSimulator::addDFAState ParserATNSimulator.cpp:1335
antlr4::atn::ParserATNSimulator::addDFAEdge ParserATNSimulator.cpp:1287
antlr4::atn::ParserATNSimulator::computeTargetState ParserATNSimulator.cpp:324
antlr4::atn::ParserATNSimulator::execATN ParserATNSimulator.cpp:188
antlr4::atn::ParserATNSimulator::adaptivePredict ParserATNSimulator.cpp:163
antlr_memmap::MemMapParser::region() 0x000000000048b2f2
antlr_memmap::MemMapParser::memmap() 0x000000000048acdc

Looking at the stacktrace, seems like we are getting a pointer from an std::unordered_set which memory has been deallocated.

You may use the following code to trigger the SEGFAULT:

int main() {
    ANTLRInputStream input("test_test     , 0x0 , 0x1 , 4K      , TEST , TEST , :R:W:X:8:16:32:64:");
    MemMapLexer lexer(&input);

    CommonTokenStream tokens(&lexer);

    MemMapParser parser(&tokens);

    auto* p = parser.memmap();  // <-- This call produces a SEGFAULT
}
jimidle commented 9 months ago

While it would be better to get a message about your grammar, the problem is with your grammar. Your actual grammar should just be

(WORD | ‘_’) + EOF

There is no way to end a region in many cases as should a new word remember the + or go on to WORD* ?

If region had done delineation, then make your region rule reflect that. If it does not, then your grammar is just WORD s and ‘_’ and there is no need for ANTLR.

Also, you are not catering for white space and say skipping those characters. You are also not catering for other character, something like:

WORD: [A-Z][a-z]+ ; // Assuming only first char is CAPS WS: [ \t] -> skip ; BADCHAR: . ;

You should study and practice with existing grammars before attempting this I think. You do not say what you are trying to parse.

On Mon, Feb 26, 2024 at 02:41 Josep Sans @.***> wrote:

With this grammar:

grammar MemMap;

memmap: region+ EOF; region: (WORD | '_')+ WORD*;

WORD: [A-Za-z]+;

I get a SEGFAULT in: antlr4::atn::PredictionContext::getContextType PredictionContext.h:170. The stack trace is the follow:

antlr4::atn::PredictionContext::getContextType PredictionContext.h:170 antlr4::atn::SingletonPredictionContext::equals SingletonPredictionContext.cpp:60 antlr4::atn::operator== PredictionContext.h:206 antlr4::atn::PredictionContextCache::PredictionContextComparer::operator() PredictionContextCache.cpp:55 std::__detail::_Equal_helper::_S_equals hashtable_policy.h:1460 std::__detail::_Hashtable_base::_M_equals hashtable_policy.h:1844 std::_Hashtable::_M_find_before_node hashtable.h:1562 std::_Hashtable::_M_find_node hashtable.h:649 std::_Hashtable::find hashtable.h:1452 std::unordered_set::find unordered_set.h:654 antlr4::atn::PredictionContextCache::get PredictionContextCache.cpp:40 getCachedContextImpl PredictionContext.cpp:53 antlr4::atn::PredictionContext::getCachedContext PredictionContext.cpp:520 antlr4::atn::ATNSimulator::getCachedContext ATNSimulator.cpp:32 antlr4::atn::ATNConfigSet::optimizeConfigs ATNConfigSet.cpp:138 antlr4::atn::ParserATNSimulator::addDFAState ParserATNSimulator.cpp:1335 antlr4::atn::ParserATNSimulator::addDFAEdge ParserATNSimulator.cpp:1287 antlr4::atn::ParserATNSimulator::computeTargetState ParserATNSimulator.cpp:324 antlr4::atn::ParserATNSimulator::execATN ParserATNSimulator.cpp:188 antlr4::atn::ParserATNSimulator::adaptivePredict ParserATNSimulator.cpp:163 antlr_memmap::MemMapParser::region() 0x000000000048b2f2 antlr_memmap::MemMapParser::memmap() 0x000000000048acdc

Looking at the stacktrace, seems like we are getting a pointer from an std::unordered_set which memory has been deallocated.

You may use the following code to trigger the SEGFAULT:

int main() { ANTLRInputStream input("test_test , 0x0 , 0x1 , 4K , TEST , TEST , :R:W:X:8:16:32:64:"); MemMapLexer lexer(&input);

CommonTokenStream tokens(&lexer);

MemMapParser parser(&tokens);

auto* p = parser.memmap();  // <-- This call produces a SEGFAULT

}

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMCRD5OC25OFQM3HYJLYVRDDRAVCNFSM6AAAAABDZYEBBGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2TGNJYG4ZTMMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Sustrak commented 9 months ago

Hi @jimidle thanks for you answer. Here is the complete grammar of the parser I'm trying to build:

grammar MemMap;

// Parser rules
memmap: region+ EOF;
region: id ',' address ',' address ',' size ',' protocol ',' mem_type ',' attribute_list NEWLINE;
id: LETTER (LETTER| DEC_NUMBER | '_' | '-')*;
address: HEX_NUMBER;
size: REGION_SIZE;
protocol: 'CHI' | 'AXI' | 'AXILite';
mem_type: LETTER+;
attribute_list: ':' attribute (':' attribute)* ':';
attribute: DEC_NUMBER+ | LETTER+ | vec_size;
vec_size: 'v' DEC_NUMBER+;

// Lexer rules
fragment NUMBER: [0-9];
fragment HEX_LETTER: [a-fA-F];

HEX_NUMBER: '0x' (NUMBER | HEX_LETTER)+;
DEC_NUMBER: NUMBER+;
LETTER: [a-zA-Z];

REGION_SIZE: DEC_NUMBER ('B' | 'K' | 'M' | 'G' | 'T');

WHITESPACE: [ \t]+ -> skip;
NEWLINE: ('\r'? '\n' | '\r')+;
BADCHAR: .;

An example of what I'm trying to parse is:

test_2        , 0x000000000000 , 0x000000000FFF , 4K      , AXI , UC , :R:W:X:8:16:32:64:
test_4         , 0x000001000000 , 0x000001007FFF , 32K     , AXI , UC  , :R:W:X:8:16:32:64:

With this grammar I get a SEGFAULT in the same location of the code as in the grammar above.

Do I still have problems with the grammar definition? I have tried it with the grun tool and it parses the examples provided correctly.

Thanks for the help

kaby76 commented 9 months ago

What OS and compiler are you using? Your grammar and input works fine for (WSL2) Ubuntu 20.04.6 LTS, g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0. Also works for Windows 11, MSVC 19.39.33520.0. Both using Antlr 4.13.1.

Sustrak commented 9 months ago

Hi I'm using:

Rocky Linux Linux 5.4.269-1.el8.elrepo.x86_64.

G++: gcc version 12.2.1 20221121 (Red Hat 12.2.1-7) (GCC) CLANG: clang version 16.0.6 (Red Hat 16.0.6-2.module+el8.9.0+1651+e10a8f6d)

I get the same SEGFAULT with both compilers

ANTLR4: 4.13.1


Tried with WSL with the following compiler and same ANTLR version as above and it works fine:

$: c++ --version
c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Sustrak commented 9 months ago

Hi, I just realized that since I'm using CMake to download the C++ runtime it doesn't use the same compiler as in my project to compile the runtime (with a much older std lib). This is causing the SEGFAULT in the runtime code.

Sorry for the inconveniences and thank you for your help.