bpsm / edn-java

a reader for extensible data notation
Eclipse Public License 1.0
100 stars 24 forks source link

Octal escapes in string and character literals? #65

Open avodonosov opened 4 years ago

avodonosov commented 4 years ago

59 and #60 implement unicode escapes in character and string literals.

How about octal escapes?

I discovered that both clojure language reader and the edn reader from the official clojure github project - https://github.com/clojure/tools.reader - support this.

(Octal escapes in string literals come from Java, only that Java syntax for that is backlash followed by up to 3 digits, while in Clojure and in tools.reader exactly 3 digits are required.).

In string literals the syntax is baclash followed by 3 digits: \NNN. The first digit can be between 0 and 3, the last two digits are between 0 and 7.

For character literals the syntax is \oNNN. Again, the first digit is between 0 and 3, the last two are between 0 and 7.

$ clj -r
Clojure 1.10.1
user=> "aaa\062aaa"
"aaa2aaa"
user=> \o062
\2
user=> (require '[clojure.tools.reader.edn :as edn])
nil
user=> (edn/read-string "\"aaa\\062aaa\"")
"aaa2aaa"
user=> (edn/read-string "\\o062")
\2

The poposal in https://github.com/edn-format/edn/pull/65 also includes octal escapes.

Not to say I personally need to use octal literals in my code. Just FYI. It may be good to have some consistency between EDN implementations.

bpsm commented 4 years ago

I'm sceptical about adding octal literals unless there's an actual need. I consider them an ugly relic of a bygone era when some machines had word lengths that were multiples of 6 bits rather than 8. They are limited to just Latin-1: they can't even cover the whole (16-bit) range of a Java char.