edn-format / edn

Extensible Data Notation
2.62k stars 96 forks source link

Support fo unicode and octal escapes in string literals. #65

Open wagjo opened 10 years ago

wagjo commented 10 years ago

Specs do not mention whether unicode and octal escapes are supported or not. As clojure.edn supports it [1], I've added an explicit mention in the specs. I'm a registered clojure contributor (signed CA).

[1] https://github.com/clojure/clojure/blob/c6756a8bab137128c8119add29a25b0a88509900/src/jvm/clojure/lang/EdnReader.java#L580

avodonosov commented 4 years ago

@richhickey, the absence of unicode escapes in string literals is really limiting. And the reason for that is unclear, given that unicode escapes are supported for characters.

avodonosov commented 4 years ago

The maintainer of edn-java library kindly agreed to implement unicode escapes in the library. Initially, it was planned as an option, disabled by default. After implementing it that way it was discovered that https://github.com/clojure/tools.reader supports unicode escapes by default, so edn-java finally implemented unicode escapes enabled by default.

Turns out https://github.com/clojure/tools.reader also supports octal escapes in string and character literals, same as in the clojure languate. (The current edn spec includes unicode escapes for characters, but misses octal escapes).

@richhickey IMHO clarity is needed in the spec. It's strange unicode escapes are not specified for strings while they are specified for characters. And what about octal escapes?

@wagjo, if your pull requests includes octal escapes for string litertals, makes sense to include them for characters tool (the clojure language and the tools.reader support them in the form \oNNN).

As for backwards compatibility, I would suggest to include the escapes into the spec and add a comment: "Unicode and octal escapes in string literals and octal escapes in character literals were only added to the spec in 2020. Some implementations supported them before that. For compatibility, consumers of EDN documents (including parsing libraries) should always support the escapes. The suppliers of EDN documents should avoid the escapes, unless they verified all the consumers of their documents support the escapes"

avodonosov commented 4 years ago

BTW, in Java octal escapes in string literals can contain up to 3 digits (https://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html), while the clojure reader and the clojure.tools.reader.edn require exactly 3 digits after backlash.

So @wagjo, the wording "as in Java" in the pull request does not match precisely the current implementations.