Closed jml closed 10 years ago
Of course, another solution would be to interpret all strings as text and require custom extensions for literal bytes, e.g.
#bytes (72 101 108 108 111 32 119 111 114 108 100)
or
#base64 "SGVsbG8gd29ybGQ="
The #base64
option is how I am planning to implement raw byte data in my obj-c
implementation.
I'm confused about your issue; are you suggesting you'd stash arbitrary bytes into a string
(i.e. "[unreadable gibberish]"
)? Wouldn't this violate the spec, in that it could (and likely would frequently) contain byte sequences that are not valid UTF-8? (off the top of my head, 0xFF
would be an invalid UTF-8 byte)
It would indeed violate the spec. (TIL: Not all sequences of bytes are valid UTF-8).
Although I think the spec could perhaps be clearer about whether readers should decode strings to unicode, I'm happy to consider this issue closed.
There will be separate tag for bytes/base64
@richhickey any guidance on what the tag will be called?
My vote: #base64 "YW55ICsgb2xkICYgZGF0YQ=="
edn streams & elements are all UTF-8 encoded, which is great. However, there's no guidance in the spec for whether a reader should decode strings into Unicode.
This matters, since sometimes you need to send a sequence of literal bytes (which should not be decoded), and other times you need to send human-readable text (which most definitely should).
One solution would be to say that the reader should never decode strings, and add a built-in tagged element
#text
that decodes the string.