Open xmcp opened 3 years ago
This behavior looks correct to me. The above program, after Unicode escape processing is,
class Foo { String bar = """; }
and the error you are receiving is consistent with the Java compiler,
$ java -version
openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)
$ cat Foo.java
class Foo { String bar = "\u0022"; }
$ javac Foo.java
Foo.java:1: error: unclosed string literal
class Foo { String bar = "\u0022"; }
^
Foo.java:1: error: reached end of file while parsing
class Foo { String bar = "\u0022"; }
^
2 errors
This behavior is dictated by the Java Language Specification. These two sections in particular,
The short version being that Unicode escapes are processed before any other tokenization or parsing is performed.
It would however make sense for javalang to preserve the original text to use when calculating positions and reporting errors.
It seems that javalang replaces unicode escapes back to the raw form (as pointed out in issue #58) in
pre_tokenize
method before tokenizing.I don't get why this replacement is necessary (
pre_tokenize
method is added since the initial commit), and this may lead to failures in rare conditions.Example:
PR #96 fixes this issue and maybe we should merge it?