Closed mroch closed 7 years ago
I'm afraid this won't be possible with the current interface: Uutf's is based on the Uchar.t
type from the standard library whose values represent Unicode scalar values.
I think the best that could be done would be to have a very clear specification of which Malformed
are returned on WTF-8
and maybe a decoding function wtf_8 : [
Malformed of string ] -> int` but I'm not sure that's worth the effort and it may be clearer to implement your own decoder.
That said I'm not sure where you actually need that. I don't think JavaScript allows you to write files in WTF-8 (and if it does, it should not, WTF-8 is not supposed to be used in text files).
Ping ?
I ended up writing this: https://github.com/facebook/flow/tree/master/src/third-party/wtf8
I'm planning to split it into an opam module instead of buried inside flow, just need to find time to add build files and whatnot.
Thanks!
I'm thinking about using uutf in Flow's javascript parser (https://github.com/facebook/flow). JS strings are UTF-16, but also allow unpaired surrogates (see the spec). for example,
var x = "\uDC00"
is a valid string.this stupid encoding has been dubbed "WTF-8": https://simonsapin.github.io/wtf-8/
Would you be amenable to a PR adding support to uutf? I imagine it would be identical to the existing UTF-8 code, except with the
malformed
checks removed. I could probably refactor to reuse most of the UTF-8 code.