Closed GoogleCodeExporter closed 9 years ago
Hi,
The Snappy API doesn't really force you to do any such thing. You can write
your own sink which decompresses into an iovec if you so desire; although only
UncheckedByteArraySink is included in the distribution, you can use whatever
sink you want.
Also, see https://github.com/andikleen/snappy-c/blob/master/snappy.h -- Andi
Kleen's C port supports compression and decompression to/from iovec.
Original comment by se...@google.com
on 13 Dec 2012 at 10:40
The API in snappy.h only allows you to use a sink with Compress(). It does not
allow you to use a sink with Uncompress().
So please help me understand how I can write a sink that works with Uncompress.
Original comment by mohit.a...@gmail.com
on 13 Dec 2012 at 4:31
Sorry, I was using the wrong word; you don't want a Sink here, you want a
Writer. I also see that due to the templated nature, you cannot make new
Writers without modifying snappy.cc.
Still, the point stands, in that you can use Andi Kleen's C port if you want
this.
Note that decompressing to an iovec is bound to be less efficient than to an
array, since resolving backreferences across buffers is more tricky. If you
want just a 4 kB chunk in the middle, I doubt you can get much more efficient
than simply decompressing to an array and memcpy-ing out those four kilobytes.
The primary use case for such a thing would probably not be efficiency, but
ease of allocation in a kernel context.
Original comment by se...@google.com
on 14 Dec 2012 at 8:55
I don't wish to copy out just 4K. Say I'm compressing data in 256KB units. And
I may be interested in only 64K somewhere in the middle after decompression. It
is also possible that I may be interested in all 256KB up front, but then divvy
up my buffer into different request units that are processed separately. All
these require that after decompression, I be able to put them in smaller units
so that if any of these units is not referred to anymore, they can be deleted
without hoarding up memory. BTW I used to be a Googler working for the GFS team
for 5 yrs - the use of DataBuffer object in Google would be an example for this.
Looking at snappy.cc, I understand why Sink is not provided in the uncompress
API - we're worried about the virtual function ovhd there. The alternative
would be to support an additional API on snappy.h - something like:
bool RawUncompress(Source *compressed, const struct iovec *iov, int iovcnt);
Then one can implement a SnappyIOvecWriter class inside snappy.cc (analogous to
the SnappyArrayWriter class already there). That'll eliminate the ovhd due to
virtual functions and yet provide a 'scatter' interface.
If you like, I can contribute code on this for consideration for inclusion in
future snappy releases.
I'll look at Andi Kleen's C port, but a brief look indicates that a complete C
port seems unnecessary for this. Also, I'd rather pick up a 'blessed'
implementation as we're using snappy in a production setting.
Original comment by mohit.a...@gmail.com
on 14 Dec 2012 at 5:04
Seems like my answer didn't hit the bug; pasting the relevant part:
If you want to include an IOVecWriter, I'll be happy to consider it
for consideration. It sounds like a reasonable way to solve this.
Original comment by se...@google.com
on 20 Dec 2012 at 1:34
[deleted comment]
Thanks. I've written the code - will test it and let it bake for sometime to
ensure its bug free before I submit it (within a few weeks).
Original comment by mohit.a...@gmail.com
on 21 Dec 2012 at 8:29
Hi,
Did you ever get anywhere here?
Original comment by se...@google.com
on 20 Feb 2013 at 3:24
Sorry - I quit my last company and so didn't have access to the code I'd
written earlier. Let me write it again and then send it to you.
Original comment by mohit.a...@gmail.com
on 20 Feb 2013 at 4:18
I've modified 3 files in the latest 1.1.0 release. These files are attached -
these are snappy.h, snappy.cc and snappy_unittest.cc.
Original comment by mohit.a...@gmail.com
on 21 Feb 2013 at 8:39
Attachments:
Hi,
Please sign the Contributor License Agreement
(http://code.google.com/legal/individual-cla-v1.0.html); it's a prerequisite
for accepting non-trivial patches. It's a really simple procedure, though.
We'll take it from there.
Original comment by se...@google.com
on 22 Feb 2013 at 12:53
I've signed and submitted the contributor license agreement.
Original comment by mohit.a...@gmail.com
on 24 Feb 2013 at 1:28
Thanks.
I'm looking through the code and making some cleanups, and I need to figure out
a few things: First, what is the relationship between expected_written_ and the
space in the iovecs? I presume that it's okay to have space in the iovec that
goes beyond what's set by expected_written_? However, if there is room in the
iovec, are we allowed to write junk data past the end of expected_written_? (If
so, the space_left >= 16 test in TryFastAppend is probably unneeded.)
Do you intend to add benchmarks?
Original comment by se...@google.com
on 25 Feb 2013 at 12:56
Another question: Are there situations where iov_len can be 0? If so, Append()
is broken. (If not, I'll add an assert.)
Original comment by se...@google.com
on 25 Feb 2013 at 1:42
Ah, no, Append() is not broken. I'd still like to know if iov_len can be 0,
though.
Original comment by se...@google.com
on 25 Feb 2013 at 1:43
> I'm looking through the code and making some cleanups, and I need to figure
out
> a few things: First, what is the relationship between expected_written_ and
the
> space in the iovecs? I presume that it's okay to have space in the iovec that
goes
> beyond what's set by expected_written_?
Yes, it is ok to have expected_written_ to be smaller than the cumulative space
in the iovecs. the base_iov_cnt_ is essentially there for assertion purposes -
otherwise, the code is modeled after SnappyArrayWriter.
> However, if there is room in the iovec, are we allowed to write junk data
past the
> end of expected_written_? (If so, the space_left >= 16 test in TryFastAppend
is
> probably unneeded.)
If we leave out 'space_left >= 16', then we'll return true when we should be
returning false. That is, while it is ok to write the data without worrying
about
memory corruption, we should really be returning false since we're writing more
than
what expected_written_ allows.
> Ah, no, Append() is not broken. I'd still like to know if iov_len can be 0,
though.
Yes, it should be ok to have some iovecs in there for which the iov_len is 0.
> Do you intend to add benchmarks?
I'd need to understand the benchmarking framework in snappy_unittest.cc first -
was being a bit lazy here. If it is simple to add benchmarks, I'd appreciate if
you
can add one - or if you insist, I can learn this stuff and provide an update
with
a benchmark.
BTW, I noticed that in snappy_unittest.cc, line 513 should be 'delete[] buf;'
and not
'delete buf'.
Original comment by mohit.a...@gmail.com
on 25 Feb 2013 at 2:17
OK, so expected_written_ is really a hard limit, not just an expectation? If
so, it should probably be renamed; I'm some of these cleanups right now.
(heapchecker already found the delete[] vs. delete, so I already fixed that.)
In general, I find the code in AppendFromSelf pretty complex and scary, yet the
unit test is very thin. I don't think this really belongs in VerifyString();
perhaps there should be a separate test that's deliberately testing some of the
edge cases we can get here? In particular, the source and destination iovecs
can go from non-overlapping to overlapping to non-overlapping again, and
although I _think_ your logic is right, I'd much rather have it under test than
just looking hard at it.
Original comment by se...@google.com
on 25 Feb 2013 at 2:37
Hi,
What was the conclusion here? I don't think this code can go in unless someone
provides more exhaustive unit tests.
Original comment by se...@google.com
on 21 Mar 2013 at 1:35
I seem to have mis-understood your comment from Feb 25th. It seemed like you
were fixing the code, and it made me think you're also enhancing the unittest.
I can work on the unittest as per your comments. However, I'm out of the
country for the next 2 weeks - so I can get to it by mid April or so.
Original comment by mohit.a...@gmail.com
on 21 Mar 2013 at 7:53
I've updated the unittest. Please take a look. I've also updated the interface
definitions in snappy.h to indicate that the individual buffers in an iovec
shouldn't overlap. I have not made any change to snappy.cc - so I'm not
attaching that in this post.
Original comment by mohit.a...@gmail.com
on 6 May 2013 at 4:21
Attachments:
Hi,
Sorry for letting this slip for so long. I'll try to squeeze in some time for
looking at this code later today; there is already consensus internally that
this would be a good thing to have, although we'd need to figure out something
for Windows.
Original comment by se...@google.com
on 12 Jun 2013 at 3:42
Hi again,
First of all, please try to send patches as patches, not full files. It's very
hard to keep track of things that way.
Second, my comment about the unit test was not about the iov pointers
overlapping, it was about your logic for from_iov_index != curr_iov_index_.
This is, as far as I can see, still largely untested (especially the
transitions between the different states), and it makes me worried. I see you
have added a benchmark, and that is fine, but apart from that I do not find the
tests to be substantially more covering than the ones from the previous patch.
(Also, I need to figure out what to do about Windows compatibility.)
I might find some time to write better tests for the corner cases at some
point, but things are a bit tight at the moment.
Original comment by sgunder...@bigfoot.com
on 12 Jun 2013 at 9:12
(This time from the right account as well. Multilogin keeps being hard...)
OK, I've written unit tests that I'm sort-of happy with. I need to test things
on Windows, and also of course send for review. But you're off the hook for
now. :-)
Original comment by se...@google.com
on 13 Jun 2013 at 12:05
Original comment by se...@google.com
on 13 Jun 2013 at 10:51
Fixed in r76. Thanks for your patience.
Original comment by se...@google.com
on 13 Jun 2013 at 4:20
Thanks for taking care of the remaining loose ends. I'm on vacation right now,
and so only sporadically checking emails.
Original comment by a...@cohesity.com
on 13 Jun 2013 at 5:33
Just wondering when the next snappy release is coming out - hopefully with this
change included.
Original comment by a...@cohesity.com
on 23 Sep 2013 at 5:27
I haven't really considered a release; we've tended to release when there's
been serious bugfixes or when we had a usable performance improvement, and
neither has really been happening recently. But if having iovec support in a
release is important for you, I guess we could put out an 1.1.1 soon...
Original comment by se...@google.com
on 30 Sep 2013 at 12:52
That'd be great. Thanks.
Original comment by mohit.a...@gmail.com
on 30 Sep 2013 at 2:29
Original issue reported on code.google.com by
mohit.a...@gmail.com
on 13 Dec 2012 at 7:29