Closed Sustrak closed 9 months ago
While it would be better to get a message about your grammar, the problem is with your grammar. Your actual grammar should just be
(WORD | ‘_’) + EOF
There is no way to end a region in many cases as should a new word remember the + or go on to WORD* ?
If region had done delineation, then make your region rule reflect that. If it does not, then your grammar is just WORD s and ‘_’ and there is no need for ANTLR.
Also, you are not catering for white space and say skipping those characters. You are also not catering for other character, something like:
WORD: [A-Z][a-z]+ ; // Assuming only first char is CAPS WS: [ \t] -> skip ; BADCHAR: . ;
You should study and practice with existing grammars before attempting this I think. You do not say what you are trying to parse.
On Mon, Feb 26, 2024 at 02:41 Josep Sans @.***> wrote:
With this grammar:
grammar MemMap;
memmap: region+ EOF; region: (WORD | '_')+ WORD*;
WORD: [A-Za-z]+;
I get a SEGFAULT in: antlr4::atn::PredictionContext::getContextType PredictionContext.h:170. The stack trace is the follow:
antlr4::atn::PredictionContext::getContextType PredictionContext.h:170 antlr4::atn::SingletonPredictionContext::equals SingletonPredictionContext.cpp:60 antlr4::atn::operator== PredictionContext.h:206 antlr4::atn::PredictionContextCache::PredictionContextComparer::operator() PredictionContextCache.cpp:55 std::__detail::_Equal_helper::_S_equals hashtable_policy.h:1460 std::__detail::_Hashtable_base::_M_equals hashtable_policy.h:1844 std::_Hashtable::_M_find_before_node hashtable.h:1562 std::_Hashtable::_M_find_node hashtable.h:649 std::_Hashtable::find hashtable.h:1452 std::unordered_set::find unordered_set.h:654 antlr4::atn::PredictionContextCache::get PredictionContextCache.cpp:40 getCachedContextImpl PredictionContext.cpp:53 antlr4::atn::PredictionContext::getCachedContext PredictionContext.cpp:520 antlr4::atn::ATNSimulator::getCachedContext ATNSimulator.cpp:32 antlr4::atn::ATNConfigSet::optimizeConfigs ATNConfigSet.cpp:138 antlr4::atn::ParserATNSimulator::addDFAState ParserATNSimulator.cpp:1335 antlr4::atn::ParserATNSimulator::addDFAEdge ParserATNSimulator.cpp:1287 antlr4::atn::ParserATNSimulator::computeTargetState ParserATNSimulator.cpp:324 antlr4::atn::ParserATNSimulator::execATN ParserATNSimulator.cpp:188 antlr4::atn::ParserATNSimulator::adaptivePredict ParserATNSimulator.cpp:163 antlr_memmap::MemMapParser::region() 0x000000000048b2f2 antlr_memmap::MemMapParser::memmap() 0x000000000048acdc
Looking at the stacktrace, seems like we are getting a pointer from an std::unordered_set which memory has been deallocated.
You may use the following code to trigger the SEGFAULT:
int main() { ANTLRInputStream input("test_test , 0x0 , 0x1 , 4K , TEST , TEST , :R:W:X:8:16:32:64:"); MemMapLexer lexer(&input);
CommonTokenStream tokens(&lexer); MemMapParser parser(&tokens); auto* p = parser.memmap(); // <-- This call produces a SEGFAULT
}
— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMCRD5OC25OFQM3HYJLYVRDDRAVCNFSM6AAAAABDZYEBBGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2TGNJYG4ZTMMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi @jimidle thanks for you answer. Here is the complete grammar of the parser I'm trying to build:
grammar MemMap;
// Parser rules
memmap: region+ EOF;
region: id ',' address ',' address ',' size ',' protocol ',' mem_type ',' attribute_list NEWLINE;
id: LETTER (LETTER| DEC_NUMBER | '_' | '-')*;
address: HEX_NUMBER;
size: REGION_SIZE;
protocol: 'CHI' | 'AXI' | 'AXILite';
mem_type: LETTER+;
attribute_list: ':' attribute (':' attribute)* ':';
attribute: DEC_NUMBER+ | LETTER+ | vec_size;
vec_size: 'v' DEC_NUMBER+;
// Lexer rules
fragment NUMBER: [0-9];
fragment HEX_LETTER: [a-fA-F];
HEX_NUMBER: '0x' (NUMBER | HEX_LETTER)+;
DEC_NUMBER: NUMBER+;
LETTER: [a-zA-Z];
REGION_SIZE: DEC_NUMBER ('B' | 'K' | 'M' | 'G' | 'T');
WHITESPACE: [ \t]+ -> skip;
NEWLINE: ('\r'? '\n' | '\r')+;
BADCHAR: .;
An example of what I'm trying to parse is:
test_2 , 0x000000000000 , 0x000000000FFF , 4K , AXI , UC , :R:W:X:8:16:32:64:
test_4 , 0x000001000000 , 0x000001007FFF , 32K , AXI , UC , :R:W:X:8:16:32:64:
With this grammar I get a SEGFAULT in the same location of the code as in the grammar above.
Do I still have problems with the grammar definition? I have tried it with the grun
tool and it parses the examples provided correctly.
Thanks for the help
What OS and compiler are you using? Your grammar and input works fine for (WSL2) Ubuntu 20.04.6 LTS, g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0. Also works for Windows 11, MSVC 19.39.33520.0. Both using Antlr 4.13.1.
Hi I'm using:
Rocky Linux Linux 5.4.269-1.el8.elrepo.x86_64
.
G++: gcc version 12.2.1 20221121 (Red Hat 12.2.1-7) (GCC)
CLANG: clang version 16.0.6 (Red Hat 16.0.6-2.module+el8.9.0+1651+e10a8f6d)
I get the same SEGFAULT with both compilers
ANTLR4: 4.13.1
Tried with WSL with the following compiler and same ANTLR version as above and it works fine:
$: c++ --version
c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Hi, I just realized that since I'm using CMake to download the C++ runtime it doesn't use the same compiler as in my project to compile the runtime (with a much older std lib). This is causing the SEGFAULT in the runtime code.
Sorry for the inconveniences and thank you for your help.
With this grammar:
I get a SEGFAULT in:
antlr4::atn::PredictionContext::getContextType PredictionContext.h:170
. The stack trace is the follow:Looking at the stacktrace, seems like we are getting a pointer from an
std::unordered_set
which memory has been deallocated.You may use the following code to trigger the SEGFAULT: