antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.28k stars 3.3k forks source link

[C++] parsing too greedy, misses space #2377

Open siliconvoodoo opened 6 years ago

siliconvoodoo commented 6 years ago

I'm not sure if I did something wrong, or if there is a bug in AntlR, or the C++ version of antlr. but here what I'm getting, and I'll let you judge from the image alone, that it's weird: image

my main code just does the classic ` std::ifstream in{ argv[1] }; ANTLRInputStream input(in); helloLexer lexer(&input); CommonTokenStream tokens(&lexer);

helloParser parser(&tokens); tree::ParseTree *tree = parser.program(); `

my grammar as text: https://pastebin.com/iRhUFcP1

generated with this command antlr4 hello.g4 -Dlanguage=Cpp

siliconvoodoo commented 6 years ago

I fixed it by changing the identifier recognition rule by a lexer rule, by naming it Identifier (capital I) instead. And marking LETTER and DIGIT with fragment annotation. I don't know why it changes anything, but it was done like that in the C grammar example here https://github.com/antlr/grammars-v4/blob/master/c/C.g4 And my language is almost C. But it definitely looks like a bug to me now, I'm no expert though.