Closed raphael-proust closed 8 years ago
Thanks. That looks clean. One question though do you really need the `
Substring` source ? I'd prefer to hold that until I have considered #2. (If the answer is no, don't bother I'll split the patch myself).
I have a work around for now:
let decoder c =
let d = Uutf.decoder ~encoding:`UTF_8 `Manual in
Uutf.Manual.src d c.content c.pos c.len;
d
And then I treat Await
as End
. It's not as clean as Substring
, but it is acceptable.
Alternatively, String
could carry two optional integer values.
Alternatively, there could be a way to signal a Manual
decoder that there won't be anymore calls to Manual.src
(and thus to return End
instead of Await
). Something like Manual.seal
or Manual.terminate
.
Le jeudi, 5 février 2015 à 09:03, Raphaël Proust a écrit :
And then I treat Await as End.
This is wrong you have to terminate the manual source properly as documented (there could be a truncated character at the end and you will miss a `Malformed).
It's not as clean as Substring, but it is acceptable.
Yes I forgot about that – actually I'm pretty sure I didn't include offsets in `String because I thought you could use that at the time. Seems good enough to me, I'd like to avoid complexifying the api too much. I don't mind more work for clients that are not in the average use case.
Alternatively, there could be a way to signal a Manual decoder that there won't be anymore calls to Manual.src (and thus to return End instead of Await). Something like Manual.seal or Manual.terminate. More complex w.r.t. api, documentation and implementation.
Daniel
Manual
will work for me.
Although…
And then I treat Await as End.
This is wrong you have to terminate the manual source properly as documented (there could be a truncated character at the end and you will miss a `Malformed).
It looks (from the source, the documentation is not quite clear on that point) that calling Manual.src
replaces the current source instead of adding on top of it. (After checking some bound properties, it executes d.i <- s
which replaces the input string in the decoder record.) How does calling Manual.src
lets you view the Malformed
characters then.
It looks (from the source, the documentation is not quite clear on that point) that calling Manual.src replaces the current source instead of adding on top of it.
I don't think the documentation should say something about this. It just tells you it will read from the string you provide. How it does this is none of your business.
How does calling Manual.src lets you view the Malformed characters then.
There's a temporary buffer that gets filled in if the byte sequence of a character overlaps two (or more) `Manual
ly provided buffers, see this comment. If there's not enough data to decode a character the continuation fills this buffer in until it can decode the character.
Ok.
I'll remove the Substring
source in the PR and use Manual
in my code.
Thanks your patch is in as 1e7da8d796170284808752b
I nuked the previous PR because I don't know how to use git properly. So here it is again, in one single, clean, well-indented patch.
Closes #4.