ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
I am writing a parser where I may need to use lexer modes. This means that I need to separate the lexer from the parser grammar into separate files, and therefore can no longer use the 'automatic' tokens in the parser grammar.
This means that the lexer must now include rules for ALL tokens used by the grammar, including all single character tokens ':', ',', ';', '(', ')', etc... Additionally, the parser grammar must also be changed to use token identifiers instead of the automatic tokens.
Bison (a parser grammar) allows the use of single character tokens like ';' directly in the parser grammar, where these tokens are mapped to an ID corresponding to the value of that ASCII character.
Flex (the lexer parser) then allows a rule to map any single character and to return the ASCII value of the character as the token ID.
IDs for all other tokens start off at 256, instead of 0.
Would it not be possible to have this feature in ANTLR too?
parser grammars allow the use of single character tokens such as ';' in the parser, and map this to a token ID corresponding to the ASCII value of the character.
lexer grammar allows a single character rule, where the token ID corresponds to the ASCII value of the character
Add the following rule to the END of the lexer grammar:
Fallback: '\00'..'\255' {type( _input[0] )}
Hello all,
I am writing a parser where I may need to use lexer modes. This means that I need to separate the lexer from the parser grammar into separate files, and therefore can no longer use the 'automatic' tokens in the parser grammar.
This means that the lexer must now include rules for ALL tokens used by the grammar, including all single character tokens ':', ',', ';', '(', ')', etc... Additionally, the parser grammar must also be changed to use token identifiers instead of the automatic tokens.
Bison (a parser grammar) allows the use of single character tokens like ';' directly in the parser grammar, where these tokens are mapped to an ID corresponding to the value of that ASCII character. Flex (the lexer parser) then allows a rule to map any single character and to return the ASCII value of the character as the token ID. IDs for all other tokens start off at 256, instead of 0.
Would it not be possible to have this feature in ANTLR too?