Closed emk closed 4 years ago
@emk I think that sounds reasonable! One concern I have here is figuring out which types do what. I had to carefully read the GzEncoder
and GzDecoder
docs a few times very carefully, and I'm still having trouble thinking about when I would use them. Perhaps you could say some words about your use case that might help me understand? We might also consider a naming overhaul (and releasing a 0.3
version) if the existing Reader
/Writer
names become confusing, although I do like their simplicity.
OK, here's a sample use case:
Read
. This might be a File
, or it might be some multi-gigabyte PostgreSQL query generating CSV data with COPY
. I don't want to write this to disk, I just want to pass it through.Read
instance, breaks the data into 5MB chunks as it reads, and then uploads the chunks to S3 in parallel using upload_object_part
and a worker pool. All queues are bounded and blocking, thanks to your chan
. (We're going to open source the worker pools and S3 client soon.)snap::read::SzipEncoder
, so that I compress the data as I read it before handing it off to the "chunker" for uploading to S3. This would be very easy and transparent.(In theory, we could do all this using Tokio, but company policy is that we're not using Tokio before async
and await
land.)
Basically, "compressing Read
" and "decompressing Write
" are rarer use cases. But when you have one API that gives you a Read
object, and another one that consumes a Read
object, and you want to compress in between, then it's really useful to have.
Most of the "big" Rust compression libraries offer all 4 versions of (compressing, decompressing) x (read, write). I have an OK to implement at least "compressing Read
" on work time. And I've implemented snappy framing in Rust before by wrapper the C version, but I've deprecated my crate in favor of yours. :-)
I'm agnostic about names. We could imitate flate2
for ecosystem-wide consistency, or we could just add the two missing versions using longer names, because they're used less often.
Good morning!
I'm starting work on a ReadCompressing
implementation this morning, which I hope to submit as a PR shortly. Obviously, you should feel free to pick a better name.
@emk Thanks so much for the great write up! Definitely fully on board with adding this stuff. I basically have two concerns:
And I've implemented snappy framing in Rust before by wrapper the C version, but I've deprecated my crate in favor of yours. :-)
:-) Are you perhaps hinting here that the Snappy frame format has to be implemented twice to support this use case? If so, I begrudgingly accept, but man, that would be a bummer.
I'm agnostic about names. We could imitate flate2 for ecosystem-wide consistency, or we could just add the two missing versions using longer names, because they're used less often.
This is really bothering me. I don't know what to do. I value consistency with other crates in the ecosystem very highly. On that alone I'm inclined to do what flate2 does, including emulate the module hierarchy. On the other hand, I also wonder whether we can avoid a module hierarchy at all and perhaps do a better job with naming. But I honestly don't have any good ideas.
@alexcrichton Do you have any thoughts here? I'll summarize briefly: @emk wants to add the equivalent of flate2::read::GzEncoder
to the snappy crate (that is, a Read
impl for uncompressed->compressed), and we're trying to figure out naming. Are you happy with the naming and module structure that flate2
uses? If not, what changes would you make?
I've personally been pretty happy with how flate2 turned out, although I wouldn't call it a silver bullet or without room for improvement by any means :). In any case though I don't have any better names off the top of my head I'd rather have named the types at this moment
@emk All right, so how do you feel about re-creating flate2
's module structure? I think that means we leave Decoder
/Encoder
as-is, then re-create read
/write
modules and have read::FrameEncoder
, read::FrameDecoder
and write::FrameEncoder
(and we could add write::FrameDecoder
later if a use case comes up).
@BurntSushi I don't think I'm hinting at anything; hinting would require more brain cells than I have right now. :-)
I once had a snappy crate with a framing implementation ("snappy framed", IIRC, on crates.io) that I deprecated a while back because it used the C FFI, and you had a pure Rust version. I don't remember whether mine implemented all four modes or not. It's a moot point; I'm not going to use mine, and I would rather contribute to your crate. Your crate is simply better, so mine is gone. :-)
(I generally find it annoying whenever there are 5 subtly different but incomplete versions of every library, something which is extremely common with some langauges, but rarer with Rust. And so I regularly deprecate my stuff and contribute to other people's crates when other people's crates have a fundamentally better design. Your crate is written in pure, performant Rust, something I never managed to accomplish with mine. Plus, it reduces the number of crates I need to actively maintain!)
I'm happy to go with the flate2
structure and the names you propose. Thank you for the spec!
@emk No no, you misunderstand. :) I wasn't referring to your other work, but whether adding this new implementation would require two distinct internal implementations of the frame format. I haven't touched this code in a long time, so it is perhaps a naive question!
OK, I've opened a working branch at https://github.com/BurntSushi/rust-snappy/pull/18, with the initial renaming of the existing Reader
and Writer
(including in the docs and the szip
) source code.
I've also set up a cargo
workspace that includes both snap
and szip
to simplify running tests, and attempted to make sure that CI also includes szip
.
@emk Excellent, thank you! Please ping me when you'd like me to review.
Thank you for a writing a pure Rust port of snappy! We use it heavily at Faraday for fast compression and decompression of huge files in many places.
We recently discovered a need for a "snappy framed" equivalent of the flate2::read::GzEncoder type. That is, a snappy encoder which implements
Read
, consuming uncompressed data and outputting compressed data.I'm volunteering to write this and send you a PR with tests and docs, assuming you're interested. :-) Thank you as always for all your great Rust tools.