Open vn971 opened 5 years ago
This is a very interesting point!
I was wondering whether there was a generic way of transforming an io::Write
into an io::Read
. The opposite would be quite simple (read bytes from an io::Read
and write them into an io::Write
), but this looks trickier. Maybe that could be possible with async functions/generators? Or with a separate process - or simply a thread - that "writes" data to the main thread, which reads it (like with Unix pipes).
In the meantime, I think the easiest way to support streaming would be to extract the loop body of the process
function (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L215) into a step
function. Then, in the streaming case, use a temporary buffer as the io::Write
for the current decoder ; the read
method of your io::Read
would repeatedly call step
and copy the bytes from the tmp buffer into the read buffer.
I probably won't have time to look at it more closely this week, but feel free to send a PR if you want to give it a try!
Thanks for the explanation!
Regarding the process
function and the temporary buffer -- indeed this is how I thought it can be done as well.
I'm not sure I'll have time in the coming days as well though. Maybe I'll come to that later if/when I get rid of other libraries that bind to OS libraries, and will be otherwise on pure Rust.
Hi,
I am maintaining swf-parser, a library to parse SWF files. These files can be encoded with LZMA and I am using this library to decode them. To support streaming parsing of SWF files, support in LZMA is required first. A low level API similar to the one used by the inflate
crate would be nice.
Using this API, you create a stream inflater maintaining the internal state of parser (for LZMA it would correspond to dictionaries and temporary buffers). You can manually feed data to the decoder it and read the result.
I've been working on an implementation for this ticket based off of the LzmaDec_TryDummy function in libhtp's port of the LZMA SDK. The main issue in incrementally executing the loop is that you may end up in a partially corrupted state if you are in the middle of a function and you fail to read the next byte because it isn't available yet.
Also, I used the std::io::Write
trait instead of std::io::Read
to create an interface like flate2::write::DeflateDecoder
.
I'll publish this soon. It will most likely be dependant on #50 .
I'm now wondering whether integrating with async/await would be the way to go to implement this. Something like taking futures::io::AsyncRead as input and writing to a futures::io::AsyncWrite or a futures::stream::Stream of bytes as output.
I don't know what the performance overhead of that would be, but from a programming perspective the code should be similar to the current one (with some extra async
keywords). The streaming mode would be gated by a feature flag.
@gendx I published a PR for this if you want to have a look. I haven't really thought of implementing it using futures but that's an interesting idea. It would add a couple extra dependencies for those who want to use a streaming API and possibly require a runtime. I was looking for a solution that uses an std::io::Write
interface to have an API consistent with flate2::write::DeflateDecoder
to implement a generic decoder.
It'd be useful if a Read
interface were also provided (compare flate2
which has both read::DeflateDecoder
and write::DeflateDecoder
).
Reading line by line is very important, for example, flate2
can read .gz
files line by line:
let f_in = std::fs::File::open("sample.txt.xz").unwrap();
let d = flate2::read::GzDecoder::new(f_in);
let mut buf_reader = std::io::BufReader::new(d);
for line in buf_reader.lines() {
println!("{}", line)
}
Currently, a blocking function is provided by the library that reads from
io::BufRead
and writes toio::Write
. This enforces the user of the library to read all contents into memory, or into a file.Sometimes, however, it is only needed to traverse the data, but not have it all at once.
Such a thing could be achieved by having a function that, given
io::Read
, gives something that implementsio::Read
as well. This way, you can progressively read compressed or decompressed stream, while the library will internally read the underlying stream. This is howxz2
crate works, for example, see the function signature ofxz2::read::XzDecoder::new
. This also looks very flexible and intuitive as well: decompressor starts to act like a "pipe" (in unix terminology), rather than something that writes.Support of it in lzma-rs would be very nice I think. Personally, I'm raising the issue because I wanted to try this library in rua https://github.com/vn971/rua Here I am using an intermediate layer of decompression for another function that accepts
Read
https://github.com/vn971/rua/blob/master/src/tar_check.rs#L26 (however, the underlying libraryxz2
is not pure Rust, but uses bindings)Thoughts?