BurntSushi / rust-snappy

Snappy compression implemented in Rust (including the Snappy frame format).
BSD 3-Clause "New" or "Revised" License
444 stars 43 forks source link

Concatenating multiple Encoder streams #44

Closed k3yavi closed 3 years ago

k3yavi commented 3 years ago

Hi @BurntSushi ,

Thanks for designing and maintaining this great tool and I wanted to add that I am big fan of all your work in rust-lang !

Recently, @rob-p and I were thinking of the possibility of a use case where data from multiple rust-snappy compressed streams can be concatenated together into one. Basically, we are working with a multithreaded scenario where multiple producer threads push that data into a single consumer thread, to be written into a file. Currently we can use rust-snappy to compress the data on the single threaded consumer but even though rust-snappy compression is fast, we are bounded by the single threaded compression slowing down the whole pipeline. I am guessing you might have already thought or discussed about this and if possible we'd like to get your thoughts on how can we move the data compression part back into the producer threads, so that the single threaded consumer can just concatenate the compressed stream avoiding the overhead of compression and instead just dump the bytes into a file.

I hope the use case makes sense and looking forward to hearing back from you.

BurntSushi commented 3 years ago

I think you should be able to just do your compression in the producer threads and then concatenate the results in the consumer. The Snappy Frame format was specifically designed for this: https://github.com/google/snappy/blob/9c1be17938429574cdec8fbf820f2d9d5ea66c5c/framing_format.txt#L68-L79

But maybe you already know that and there is something I'm missing about you setup that is inhibiting you from this.

rob-p commented 3 years ago

Hi @BurntSushi,

First, I'll echo @k3yavi's words about your rust-lang work; I'm a big fan of many of your projects. Second, thank you for the incredibly fast response. To make things a bit more concrete, the core of where this comes up in our codebase is here. Currently, we prepare a chunk of bytes as we wish them to appear in the output, and then write them to file via owriter, which requires us obtaining a lock on the underlying Mutex. We tested simply replacing the BufReader<File> that currently binds owriter with a FrameEncoder<File> which worked beautifully (the crate was so easy to use); but things slowed down a lot. Notably, the CPU usage dropped by a factor of ~2x. I presume this is because, while multiple threads are preparing their byte chunks, only one thread (the one currently holding the Mutex) is doing the compression.

So in this context, do you suggest that we could have a separate FrameEncoder per thread? I'm not sure what the best design would be in this case. If we have a separate FrameEncoder per thread, would we then e.g. write to an output buffer (in memory) and just copy that to file via a standard BufWriter<File> or some such? We'd appreciate any guidance you can offer. Thanks again!

BurntSushi commented 3 years ago

I presume this is because, while multiple threads are preparing their byte chunks, only one thread (the one currently holding the Mutex) is doing the compression.

That sounds exactly right to me.

So in this context, do you suggest that we could have a separate FrameEncoder per thread? I'm not sure what the best design would be in this case. If we have a separate FrameEncoder per thread, would we then e.g. write to an output buffer (in memory) and just copy that to file via a standard BufWriter<File> or some such? We'd appreciate any guidance you can offer. Thanks again!

Ah yeah, thanks for making this concrete. Yes, that is generally what I had in mind. This presumes you can afford the extra memory, but usually that's a trade off that comes from parallelization. (FWIW, ripgrep makes exactly this same trade off. In single threaded mode, it writes matches to stdout immediately as they are found. But when running multiple threads, it has to write results to an in-memory buffer. The buffer doesn't get written until it's passed back to the main thread, which acquires a mutex and prints it.)

Technically there are other options. e.g., You could write files in each thread and then later concatenate those files. I think it's just different trade offs for your specific work load.

k3yavi commented 3 years ago

Thanks @BurntSushi for the super fast response. We'll test out the ideas and would let you know if we face any issues.

BurntSushi commented 3 years ago

Great! Let me know how it goes. Happy to serve as a rubber duck. :-)