antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.29k stars 3.3k forks source link

Feature Request: automatic (lexer <-> parser) token IDs for single character tokens #3182

Open mjrsousa opened 3 years ago

mjrsousa commented 3 years ago

Hello all,

I am writing a parser where I may need to use lexer modes. This means that I need to separate the lexer from the parser grammar into separate files, and therefore can no longer use the 'automatic' tokens in the parser grammar.

This means that the lexer must now include rules for ALL tokens used by the grammar, including all single character tokens ':', ',', ';', '(', ')', etc... Additionally, the parser grammar must also be changed to use token identifiers instead of the automatic tokens.

Bison (a parser grammar) allows the use of single character tokens like ';' directly in the parser grammar, where these tokens are mapped to an ID corresponding to the value of that ASCII character. Flex (the lexer parser) then allows a rule to map any single character and to return the ASCII value of the character as the token ID. IDs for all other tokens start off at 256, instead of 0.

Would it not be possible to have this feature in ANTLR too?

KvanTTT commented 3 years ago

I think the feature https://github.com/antlr/antlr4/issues/2361 is more general.