Closed fonfalleh closed 4 years ago
It seems that the regular expression printer does not apply special printing rules when printing content in bracketed expressions, but maybe it should, according to the rules you quoted above:
The following escaped characters are interpreted as single special characters: \n, \r, \b, \t, \f, \uXXXX, and \u{XXXXXX}. To get ], \, or - you must escape them with \. (From https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexer-rule-elements)
The problematic lines in BNFC are thus:
https://github.com/BNFC/bnfc/blob/3ca72116c4a8f541dff2778450efabd72219cf8e/source/src/BNFC/Backend/Java/RegToAntlrLexer.hs#L79
https://github.com/BNFC/bnfc/blob/3ca72116c4a8f541dff2778450efabd72219cf8e/source/src/BNFC/Backend/Java/RegToAntlrLexer.hs#L69-L72
There, instead of calling the prt
function recursively, a special print function for content inside brackets should be called.
@fonfalleh : Can you test if PR #321 works for you?
@fonfalleh : Can you test if PR #321 works for you?
Seems to work, thanks! :+1:
Great!
My fix wasn't complete, see #329.
It seems the only characters that should be escaped in bracket expressions in regexes are
]
,\
, and-
. I'm not sure if this means that there needs to be different escaping in different contexts. https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexer-rule-elementsExample token rule that generates broken code (not by any means good or correct, I just noticed that the resulting lexer file doesn't work) :
token NoteToken ["abcdefgr"]({"es"} | {"is"})*["\',"]*(digit)*["."]* ;
results in the following line in the Lexer.g4 fileNoteToken : [abcdefgr]('e''s'|'i''s')*[\',]*DIGIT*'.'*;
which generates the following when building:warning(156): lily/lilyLexer.g4:83:38: invalid escape sequence \'
The build also complains about the following line:
STRINGTEXT : ~[\"\\] -> more;
https://github.com/BNFC/bnfc/blob/3ca72116c4a8f541dff2778450efabd72219cf8e/source/src/BNFC/Backend/Java/CFtoAntlr4Lexer.hs#L157
The build works as expected when removing the extra backslashes as follows:
NoteToken : [abcdefgr]('e''s'|'i''s')*[',]*DIGIT*'.'*;
...STRINGTEXT : ~["\\] -> more;
Sidenote: I first thought this could be related to this line, referencing RegToJLex.hs instead of RegToAntlrLexer.hs, but it seems the reference is correct, even if it's confusing naming. https://github.com/BNFC/bnfc/blob/3ca72116c4a8f541dff2778450efabd72219cf8e/source/src/BNFC/Backend/Java/CFtoAntlr4Lexer.hs#L150
Export from RegToAntlrLexer: https://github.com/BNFC/bnfc/blob/3ca72116c4a8f541dff2778450efabd72219cf8e/source/src/BNFC/Backend/Java/RegToAntlrLexer.hs#L1