Closed chyyran closed 2 years ago
The current API for decompress
is very simple and matches up well with the existing one-shot functions. One downside of such a design is that the LzBuffer
s are still one-shot and result in an allocation every time decompress
is called. However, being able to reuse a DecoderState
is already a huge savings owing to the size of that struct.
It might be worth looking into allowing reusage of a backing buffer in an LzBuffer
but this requires further design work on the LzBuffer
implementations that isn't really related to #72. The current API design here does not preclude a future API that would allow reusage of these buffers.
A note, LzmaDecoder::reset/Lzma2Decoder::reset
does not reallocate because the lclppb parameters are the same throughout the lifetime of the LzmaDecoder
.
cc @gendx
I encountered a situation where simply resetting the decoder state was not enough but since the decoder is being reused for each raw LZMA chunk, the unpacked size was also needed to be set. Since this might not be the case for every usage, I have it in reset
as Option<Option<u64>>
where None
leaves the unpacked size unchanged, and Some(_)
changes the unpacked size in the decoder state. Having to keep track of the previous unpacked size for streams with headers seemed annoying and Option<u64>
is ambiguous between "keep the unpacked size the same" and "change the unpacked size to None
(for an end of payload marker)".
I thought about exposing set_unpacked_size
to LzmaDecoder
but it didn't really make sense in my perspective since reset
prepares the decoder for the next chunk of data. For some streams, forgetting to call set_unpacked_size
may leave the decoder in an invalid state for that specific stream, so keeping it atomic was best. I don't see any situation where someone would need to change the unpacked size between raw chunks.
Hi @chyyran @gendx
I wanted to read an XZ file into decompressed chunks of some constant size. Currently I'm using xz_decompress. But xz_decompress
reads the entire file in one call. I want to call some "read" function multiple times to sequentially reads the XZ into fixed size buffers (https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/182).
Can I do read an XZ file as a sequence of decompressed byte chunks (Vec<u8>
) using these new API endpoints?
Thanks for creating this library! 😄
Pull Request Overview
This pull request fixes #72 by refactoring decoder functions to expose a zero-cost raw decoder API. The existing public API has been refactored to use the raw decoder API internally. The following additions to the public API become available with the
raw_decoder
feature enabled under thedecompress::raw
module.LzmaDecoder
LzmaDecoder::new(params: LzmaParams, memlimit: Option<usize>) -> Result<Self>
LzmaDecoder::reset(&mut self, unpacked_size: Option<Option<u64>>)
LzmaDecoder::decompress<'a, W: io::Write, R: io::BufRead>(&mut self, input: &mut R, output: &'a mut W)
Lzma2Decoder
Lzma2Decoder::new() -> Self
Lzma2Decoder::reset(&mut self)
Lzma2Decoder::decompress<'a, W: io::Write, R: io::BufRead>(&mut self, input: &mut R, output: &'a mut W)
LzmaProperties
LzmaParams
LzmaParams::new(properties: LzmaProperties, dict_size: u32, unpacked_size: Option<u64>) -> Self
LzmaParams:read_header<R>(input: &mut R, options: &Options) -> error::Result<LzmaParams>
raw_decoder
is enabled. I don't see any reason to restrict this to internal usage.Additionally, annotations have been added to indicate availability of APIs on docs.rs.
Testing Strategy
raw_decoder
APIs internally.