TokTok / hs-msgpack-types

Abstract data types and type classes for Haskell to MessagePack value converters
https://toktok.ltd/
Other
7 stars 7 forks source link

`ObjectStr Text` is incompatible with pre-2.0 MessagePack spec #36

Open l29ah opened 4 years ago

l29ah commented 4 years ago

As the spec allows non-unicode content, while Text can't accept arbitrary bytes.

What would be a good way to augment msgpack-types to avoid forking the library or breaking code compatibility by s/Text/ByteString/? Python's msgpack library uses use_bin_type and raw options to handle the old format, but i don't see how to do similar stuff in Haskell.

iphydf commented 4 years ago

By "pre-2.0", do you mean it is compatible with the post-2.0 spec?

Text accepts arbitrary Unicode code points, which includes all code points from 0 to 255. We could use that to encode arbitrary bytes. This is what we do in JSON::XS (and JSON::PP) for Perl.

l29ah commented 4 years ago

It is compatible, except that 2.0 tells the ObjectStr is UTF-8, while it wasn't limited earlier.

Well, now i observe parsing failures with unmodified msgpack-binary and msgpack-types when reading arbitrary bytes in the strings: expected:

ObjectMap [(ObjectStr "i",ObjectWord 2),(ObjectStr "r",ObjectStr "]zaxC\140\DELD\153\vK\NUL$\246\170W\DC3\203\172\147W\236HKo\249\205\DC1\169\156E\202")]

actual:

hyborg: ParseError {unconsumed = "\161i\STX\161r\218\NUL ]zaxC\140\DELD\153\vK\NUL$\246\170W\DC3\203\172\147W\236HKo\249\205\DC1\169\156E\202", offset = 1, content = "Data.Binary.Get(Alternative).empty"}
kirelagin commented 3 years ago

One option would be to use a //ROUNDTRIP encoding as GHC does for things like filenames, the only problem is that I don’t know if there is an easy way to use this kind of TextEncoding to decode Text.

Another option would be to say that this library only supports MessagePack >= 2.0 and, honestly, I think this one makes the most sense.

epoberezkin commented 7 months ago

Possibly, the unused config could be extended to support safe utf8 decoding. A separate issue?

iphydf commented 7 months ago

Yes the config makes sense to be used for that. I'm mostly in favour of supporting only 2.0 (and higher if any higher happens). How would you suggest the semantics to be for safe utf8?

l29ah commented 7 months ago

Possibly, the unused config could be extended to support safe utf8 decoding. A separate issue?

I don't think you can supply any configuration to a Get instance. Either moving to a non-UTF8-requiring type completely, or making distinct 1.0-compatible modules, will make more sense.

Currently i went the former way: https://github.com/l29ah/hs-msgpack-types https://github.com/l29ah/hs-msgpack-binary

epoberezkin commented 7 months ago

I'm mostly in favour of supporting only 2.0

+1