Open Vampire opened 5 years ago
What target language is this? Java requires the escape (#164).
Ah, right, Unicode escape, but I think then either double the backslashes to not need to use a Unicode escape for the "u", or use forward slashes. With the Unicode escape it reads totally strange. :-)
Double backslashes works in a string literal, but would have a different meaning in a comment. This is the only way to represent this Windows-style path in a Java comment.
I'm sorry to have to tell you, but you are wrong.
Excerpt from the Java Language Specification chapter "3.3 Unicode Escapes":
In addition to the processing implied by the grammar, for each raw input character that is a backslash \, input processing must consider how many other \ characters contiguously precede it, separating it from a non-\ character or the start of the input stream. If this number is even, then the \ is eligible to begin a Unicode escape; if the number is odd, then the \ is not eligible to begin a Unicode escape. If an eligible \ is not followed by u, then it is treated as a RawInputCharacter and remains part of the escaped Unicode stream. If an eligible \ is followed by u, or more than one u, and the last u is not followed by four hexadecimal digits, then a compile-time error occurs.
And to make super sure I even tried,
// Generated from net\kautler\command\\usage\Usage.g4 by ANTLR 4.7.2
compiles perfectly fine. And so does
// Generated from net\\kautler\\command\\usage\\Usage.g4 by ANTLR 4.7.2
While I still think always using forward slashes might be the nicest and most readable solution
// Generated from net/kautler/command/usage/Usage.g4 by ANTLR 4.7.2
// Generated from net\kautler\command\\usage\Usage.g4 by ANTLR 4.7.2
There are two different kinds of \
in this line:
\
are eligible \
which are not followed by u
, and therefore treated as RawInputCharacter and remain part of the escaped Unicode stream\
is not an eligible \
, and therefore is also treated as RawInputCharacter and remains part of the escaped Unicode streamDe-duplication of escaped backslashes in strings is covered by EscapeSequence
in section 3.10.6. However, no such de-duplication applies to comments in source code (section 3.7).
This leaves us with the weird situation current produced by the tool: the only way to represent a single \
followed by a lowercase u
in a Java source code comment is to use a Unicode escape for the \
character.
I don't really get your point. Of course there is no "de-duplication" of the backslash. Which also is not necessary, as it is only a comment, not something that is interpreted for runtime like a String where you need a way to express one backslash character.
So having two backslash characters in a comment is of course different from having two backslash characters in a String, but it is still a valid way to have a u
after a backslash character.
Or you follow my other suggestion and just use forward slashes.
Or let me throw in another question. What is the comment meant for? My guess is, for a human reading it to see from which file it was generated. And a human has significant problems to read this string if there is a unicode escape sequence in it.
I don't really get your point.
I'm just explaining how it came to pass that the code produces this result. It probably won't change mostly because the product is in maintenance mode and the current implementation isn't broken (even if it's ugly).
I don't really get your point.
I'm just explaining how it came to pass that the code produces this result.
Ah, ok, that's clear to me of course. :-)
It probably won't change mostly because the product is in maintenance mode and the current implementation isn't broken (even if it's ugly).
That's very sad. :-(
BTW, the "illegal unicode escape" issue seems to be back with ANTLR 4.12 as I'm getting
C:\Users\Sebastian\Development\GitHub\oss-review-toolkit\ort\utils\spdx\build\generated-src\antlr\main\org\ossreviewtoolkit\utils\spdx\SpdxExpressionVisitor.java:1: error: illegal unicode escape
// Generated from org\ossreviewtoolkit\utils\spdx\SpdxExpression.g4 by ANTLR 4.12.0
Ah, sorry, I just realized this issue is actually complaining about the Unicode encoding (which is actually required to make the "illegal unicode escape" disappear). I should have re-opened https://github.com/antlr/antlr4/issues/164 instead.
The generated files here have this comment:
but I think it should actually be