alexcrichton / xz2-rs

Bindings to liblzma in Rust (xz streams in Rust)
Apache License 2.0
81 stars 52 forks source link

Optionally allow trailing data in `bufread::XzDecoder` #86

Open bgilbert opened 2 years ago

bgilbert commented 2 years ago

Some xz streams have unrelated data afterward. In particular, Linux kernel initrd files are the concatenation of multiple cpio archives, each of which can be compressed with a different compressor. read::XzDecoder and bufread::XzDecoder return InvalidData in this case, which makes it difficult to detect the EOF, unwrap the underlying stream, and continue reading with a different decompressor. (write::XzDecoder returns Ok(0) after the end of the xz stream, which is less ambiguous.) Multi-decoder mode doesn't address this, since that only handles the case where the following data is also an xz stream.

liblzma properly returns StreamEnd here; we just need to detect it. However, the xz test suite contains some tests with trailing garbage, and the xz command-line tool is designed to fail on those unless --single-stream is specified. For compatibility, we probably can't allow trailing garbage by default, but we can provide an option. Add an allow_trailing_data() toggle to bufread::XzDecoder, and stop accepting bytes in read() if we reach StreamEnd with that toggle enabled.

Do not add a similar option to read::XzDecoder, since it's only useful if the underlying stream is synced to the end of the xz stream afterward, and read::XzDecoder can't ensure that.

Also add an additional test verifying that write::XzDecoder refuses to accept additional bytes after the xz stream reaches StreamEnd.

alexcrichton commented 2 years ago

Thanks for the PR! Unfortunately I don't really have the time to maintain this crate nowadays, though. If you're interested I could transfer ownership to you, however.

cgwalters commented 2 years ago

Hmm. In the coreos/ GH organization we do maintain some crates. So one option is to transfer it there.