alexcrichton / tar-rs

Tar file reading/writing for Rust
https://docs.rs/tar
Apache License 2.0
616 stars 178 forks source link

Bug: Extracting tar failed due to GNU Sparse File extension not properly supported #295

Open NobodyXu opened 2 years ago

NobodyXu commented 2 years ago

When extracting cbindgen-0.24.3-x86_64-apple-darwin.tar.gz using the following equivalent code:

use flate2::read::GzDecoder;
use tar::Archive;

let tar = GzDecoder::new(dat);
Archive::new(tar).unpack(path)?;

where dat is some readable that is pipelined from the downloading stage, I got:

/var/folders/4h/3pck4_r16tn6960znvv4w0nw0000gn/T/.tmp7GLCIo/bin-cbindgen
└── GNUSparseFile.0
    └── cbindgen

1 directory, 1 file

Using file cbindgen shows that it is just data, does not have executable set and cannot be executed after chmod +x cbindgen.

While it should be:

/var/folders/4h/3pck4_r16tn6960znvv4w0nw0000gn/T/.tmp7GLCIo/bin-cbindgen
└── cbindgen

I used bsdtar 3.5.1 - libarchive 3.5.1 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8 and it handles it just fine.

Here's the relevant part of the Cargo.toml:

flate2 = { version = "1.0.24", features = ["zlib-ng"], default-features = false }
tar = "0.4.38"

I discovered this bug in https://github.com/ryankurte/cargo-binstall/pull/174

NobodyXu commented 2 years ago

Here's the reproduction.

Just run cargo r and you will notice the GNUSparseFile.0 directory.

alexcrichton commented 2 years ago

I don't know what this is precisely but it looks like a gnu-specific extension to the tar format for a sparse file. If I had to hazard a guess I would say that this is probably a feature that isn't implemented by this crate at this time.

NobodyXu commented 2 years ago

@alexcrichton I also thought that until I saw this:

Also it seems that quick-install is only using normal tar -czf, not the --sparse option? So I'm unsure what caused this package, unless the tar they're using has it on by default or turns it on based on some heuristic.

The cbindgen tarball seems to be built without --sparse option.

NobodyXu commented 2 years ago

@alexcrichton Pinging as this has been stale for a month.

NobodyXu commented 1 year ago

@alexcrichton Pinging

chshersh commented 1 year ago

I'm having the same issue when using flate2 and tar in my Cargo.toml like this:

flate2 = "1.0"
tar = "0.4.38"

When I'm trying to unpack tokei-x86_64-apple-darwin.tar.gz using the following code:

fn unpack_tar(tar_path: &PathBuf, tmp_dir: &Path) -> Result<(), std::io::Error> {
    let tar_file = File::open(tar_path)?;
    let tar_decoder = GzDecoder::new(tar_file);
    let mut archive = tar::Archive::new(tar_decoder);
    archive.unpack(tmp_dir)
}

I see that the unpacking result produces the GNUSparseFile.0 directory:

$ exa --tree
.
├── GNUSparseFile.0
│  └── tokei
└── tokei-x86_64-apple-darwin.tar.gz

I'm not sure if the file actually valid (since I'm on Ubuntu 20.04 and the archive contents a macOS executable so I can't run it). But is this expected and can/should it be fixed?

NobodyXu commented 1 year ago

@chshersh #298 should fix this, but I haven't tested the solution and @alexcrichton hasn't reviewed it yet.

akesson commented 1 year ago

Same issue when unpacking cargo-generate tar.gz using:

flate2 = "1.0.25"
tar = "0.4.38"