alexcrichton / tar-rs

Tar file reading/writing for Rust
https://docs.rs/tar
Apache License 2.0
624 stars 184 forks source link

Entry iterator stops after encountering an error #284

Open isaackd opened 2 years ago

isaackd commented 2 years ago

I recently came across this issue while extracting a very large gzipped tar file with the ignore_zeros option. About halfway through the archive there's a file with a data section that starts with blocks of nulls.

When reading this file tar tries to read the first non-null content as a header and fails, setting the done flag in the entry iterator and finishing the read. https://github.com/alexcrichton/tar-rs/blob/c3e2cb848afea5954f485f593668e69e0106513e/src/archive.rs#L539-L549

I think it would be useful to allow the read to continue even when errors are encountered (either by default or with an option). I've tested with GNU tar and the tarfile module in Python and they both have this behavior by default and extract all the files. Removing the line that sets the done flag in the error case gives the desired result. With this change all the current tests pass and all the files are extracted the same as GNU tar but I'm not sure if it's incorrect in other cases.

Here's a minimal archive to test with: nullfile.tar.gz The ignore_zeros flag must be set to extract everything, and the file with null contents is bJK/bJK5oTgxVJo.xml.

alexcrichton commented 2 years ago

The flag is there to prevent issues like https://github.com/alexcrichton/tar-rs/issues/53 where the tar file may still have items but other errors may produce an infinite stream of errors.