Closed rensink closed 10 years ago
I have this exact issue right now. I'm trying to pretty-print parse trees, and only the tokens with complicated rules have their symbolic names in the ParserGrammar.getTokenNames() list.
I should point out that there is a second non-ideal way to access these names: via the generated .tokens file.
Hhmm.... well, we set it up so that it would give the "display" name in tokenNames, with the idea that it was likely more useful to the end-user then things like LBRACE.
But the text can be extracted from the token anyway. Here's an example of what my current pretty-printer outputs:
Parsing: "/home/arifogel/git/batfish/test_rigs/unit-tests/configs/underscore_variable"...OK, PRINTING PARSE TREE:
(cisco_configuration
(stanza
(null_stanza
(closing_comment
COMMENT_CLOSING_LINE:'!\n')))
(stanza
(hostname_stanza
'hostname':'hostname'
VARIABLE:'underscore_variable'
NEWLINE:'\n'))
(stanza
(null_stanza
(closing_comment
COMMENT_CLOSING_LINE:'!\n')))
(stanza
(route_map_stanza
(route_map_named_stanza
'route-map':'route-map'
VARIABLE:'JKL_MNO_PQR'
(route_map_tail
(access_list_action
'permit':'permit')
DEC:'100'
NEWLINE:'\n'
(route_map_tail_tail
(rm_stanza
(match_rm_stanza
(match_ip_prefix_list_rm_stanza
'match':'match'
'ip':'ip'
'address':'address'
'prefix-list':'prefix-list'
VARIABLE:'ABC_DEF'
VARIABLE:'_GHI'
NEWLINE:'\n'))))
(closing_comment
COMMENT_CLOSING_LINE:'!\n')))))
'end':'end'
NEWLINE:'\n'
EOF:
I would much prefer to output e.g. "MATCH:'match'" instead of "'match':'match'", especially in the cases where the token name does not quite correspond to the literal text. In that vein we have a token IP_ADDRESS_LITERAL which matches 'ip-address'. But we also have a token IP_ADDRESS which matches actual ip addresses. If a user sees 'ip-address':'ip-address', they might think that somehow came out of the IP_ADDRESS rule, which is not the case.
As an aside, what happens when two lexer rules in different modes match the same text, e.g. 'text'? Will both of the token's tokenNames entries be 'text'? Or will one be 'text' and the other use the symbolic name?
For Reference: Repository: gihub.com/arifogel/batfish Commit: c7ba5184766c038bdd5750fdc38053eec4f2b87c File: projects/batfish/src/batfish/grammar/ParseTreePrettyPrinter.java
On 08/08/2014 05:56 PM, Terence Parr wrote:
Hhmm.... well, we set it up so that it would give the "display" name in tokenNames, with the idea that it was likely more useful to the end-user then things like LBRACE.
— Reply to this email directly or view it on GitHub https://github.com/antlr/antlr4/issues/238#issuecomment-51672360.
May I suggest to make the "symbolic token name" available to the programmer? Currently, defining
produces tokens named
'struct'
and'{'
where I would prefer to haveSTRUCT
andLBRACE
. There seems to be no way to access these user-defined names except by reflection on the parser class to retrieve the static field names.Thanks, Arend