Closed aidanhs closed 7 years ago
I feel like we should be pushing people towards the methods that correspond to the C API
Definitely agreed! Looks like you're already taking care of all that though :)
There are two quotes of interest for the write method:
Heh quite a lot of interesting discussion here! It's true yeah that writing a 100% correct Write
implementaiton is actually pretty tricky. I tink you've hit most of the high points, though, and PRs to shore up the docs here in libstd would be most welcome!
Ok with #10 merged want to rebase this and I'll help take a look at the apis?
Rebased and pushed. More thoughts about the current API (in addition to the original comment):
compress
/decompress
expose an non-blocking interface you can use if you need.compress
/decompress
being so hard to use correctly, the whole "ref to a ref" thing feels quite unnatural in Rust.process
, flush
and finish
as methos on Compress
if there seems to be a need, but the Read
/Write
traits feel good enough for now.compress
/decompress
into a lowlevel
module and CompressParams
/CompressMode
up to the top level of the library (so stream
vanishes entirely). I know that when I found this crate, I visited the stream
module first and got confused at how manual it all seemed to be to use.Ok, addressed review comments, added a commit to run the example in CI and I've also moved the 'low level' stuff into a raw
module since I reckon the word stream
is too tantalising for people looking at the library.
Looks great to me, thanks so much again @aidanhs!
Out of curiosity, would you like to be made a collaborator for this repo?
No worries, I initially started on this yak shave because I was getting different compressed outputs via different methods of compressing and wondered if upgrading would help. Turns out that brotli handles an empty input specially when compressing a buffer rather than stream (https://github.com/google/brotli/blob/2c001010aa49b06de83bf28ec43be4fed8be3fbd/c/enc/encode.c#L1384-L1389) -_-
Anyway, sure! Seems pretty low activity though so it may well just be me making PRs that you review still.
Heh ok, no worries :)
Followup to #10 once it gets merged. This is a WIP.
The new design of brotli seems a little odd, but I've tried to represent it as reasonably as possible. Some notes/thoughts so far:
decompress_vec
, since it looked like an internal method that got exposed in case others found it useful. Now it's not used internally, I feel like we should be pushing people towards the methods that correspond to the C API.After struggling for some time with it, I'd like to talk about the
Write
trait. As it stands, it's seems extremely difficult to implement it as described in the documentation. There are two quotes of interest for thewrite
method:There are two possible interpretations of 1:
a. 'You can only call
write
once on each wrapped object' b. 'If a call to write fails, you're not allowed to retry'The implications of these are as follows
1a. Not allowed call
write
multiple times, sowrite_all
is clearly forbidden. As we need to ship data to a writer which may be slower than data is coming in, you either need an unbounded buffer or the ability to useOk(0)
to indicate you're flushing some data. 1b. Not allowed to usewrite_all
as it masksInterrupted
. But you can replace it with your ownwrite_all
that bubbles up any error, so this isn't too problematic.Now let's completely ignore 1, and just consider 2. The implication here is that writes are 'retry-safe', i.e. if you get an error you can try again with the buffer you just tried with and if it succeeds then all is well. This forbids
write_all
as you can't keep track of where you were up to if an error occurs and you may have already written some of the buffer, making your code retry-unsafe.So,
write_all
is basically forbidden by anyone implementingWrite
who wants to follow these rules. It turns out that the current implementation in this library useswrite_all
, but tries to be retry-safe - I am reasonably confident I can come up with a test that demonstrates this.Looking up the original IO RFC, the intention seems to have been that both 1a and 1b are the desired interpretations. I contend that adhering to these three rules is not possible in a moderately complex writer. Let's take a look a zlib. It mentions the issue I raise about
Ok(0)
, and of the three options for dealing with 1a (returningOk(0)
, using an unbounded buffer, ignoring the requirement) has opted for ignoring the requirement, happily attempting to write to a contained object more than once.It's also worth talking about the interaction between 1b and 2 here. A naive implementer following these two rules might take input, process it, and attempt to write all the bytes out in a loop, keeping track of progress. If one of the writes fails, the implementer must return the number of bytes written and so the error needs to be stashed away until the next attempted operation when it can be bubbled up. This is horrible, but can be alleviated by the One True Way displayed in zlib - an inversion of the naive logic I describe above, with fallible IO coming first, followed by consumption of the input buffer that will not fail on IO, effectively queuing the IO for the next call.
I'm just noting down my experiences (this is a terser version of a comment I lost) as I plan to:
Write
trait to make 1a 'best effort'Write
trait page