alexcrichton / tar-rs

Tar file reading/writing for Rust
https://docs.rs/tar
Apache License 2.0
625 stars 184 forks source link

tar-rs fails to decompress some ostree archives #367

Open mripard opened 3 months ago

mripard commented 3 months ago

Hi,

I've been trying to write a tool that fetches container images and extracts them. Containers are stored as multiple tar.gz files that need to be decompressed in sequence.

It looks like tar-rs struggles to decompress some of those archives, for a reason that isn't entirely clear to me yet:

$ sudo target/debug/ocibootstrap test test-dir 
11:02:37 [INFO] Running ocibootstrap 0.1.0
11:02:38 [INFO] Found layer 81867c2cbb30529f835c9ff79532b7ae0c3585da08963dd398e179a425d887fb
11:02:38 [INFO] Blob retrieved, extracting ...
11:02:39 [INFO] Done
11:02:39 [INFO] Found layer d9a9ed9b0630076fc7fd8052cc6676271e5413084e19c36502793c079af153ea
11:02:40 [INFO] Blob retrieved, extracting ...
11:02:43 [INFO] Done
11:02:43 [INFO] Found layer b16913c15949a37f7740651a76a0b1eeb717c885f577afcd39893160be77c747
11:02:43 [INFO] Blob retrieved, extracting ...
thread 'main' panicked at src/main.rs:458:40:
called `Result::unwrap()` on an `Err` value: Custom { kind: NotFound, error: TarError { desc: "failed to unpack `/var/home/max/Work/experimentations/ocibootstrap/test-dir/usr/lib/.build-id/91/319a27e0de0f6d6f0606ca103fa52f5ab78601`", io: Custom { kind: NotFound, error: "No such file or directory (os error 2) while canonicalizing /var/home/max/Work/experimentations/ocibootstrap/test-dir/sysroot/ostree/repo/objects/1c/a63fafc9ae2c7b25dbdf4dba215d4e0a7a3f9136c5b1e02bfa767da611d965.file" } } }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I've made a test tool that just tries to decompress a tar.gz archive using the same code, and can reproduce the issue:

$ sudo target/debug/examples/test-tar /root/.cache/ocibootstrap/b16913c15949a37f7740651a76a0b1eeb717c885f577afcd39893160be77c747 new-new-new-new-example-test-dir 
Input File /root/.cache/ocibootstrap/b16913c15949a37f7740651a76a0b1eeb717c885f577afcd39893160be77c747
Output Dir: new-new-new-new-example-test-dir
thread 'main' panicked at examples/test-tar.rs:32:31:
called `Result::unwrap()` on an `Err` value: Custom { kind: NotFound, error: TarError { desc: "failed to unpack `new-new-new-new-example-test-dir/usr/lib/.build-id/91/319a27e0de0f6d6f0606ca103fa52f5ab78601`", io: Custom { kind: NotFound, error: "No such file or directory (os error 2) while canonicalizing /var/home/max/Work/experimentations/ocibootstrap/new-new-new-new-example-test-dir/sysroot/ostree/repo/objects/1c/a63fafc9ae2c7b25dbdf4dba215d4e0a7a3f9136c5b1e02bfa767da611d965.file" } } }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

tar works just fine:

$ sudo tar xf /root/.cache/ocibootstrap/b16913c15949a37f7740651a76a0b1eeb717c885f577afcd39893160be77c747
$ echo $?
0

If I have a look at the output from tar though, it appears that usr/lib/.build-id/91/319a27e0de0f6d6f0606ca103fa52f5ab78601 is a symlink:

$ ls -l usr/lib/.build-id/91/319a27e0de0f6d6f0606ca103fa52f5ab78601
lrwxrwxrwx. 2 root root 47  1 janv.  1970 usr/lib/.build-id/91/319a27e0de0f6d6f0606ca103fa52f5ab78601 -> ../../../../usr/lib64/firefox/minidump-analyzer

And a63fafc9ae2c7b25dbdf4dba215d4e0a7a3f9136c5b1e02bfa767da611d965.file seems to be a symlink to the same file, but a broken one:

$ ls -l sysroot/ostree/repo/objects/1c/a63fafc9ae2c7b25dbdf4dba215d4e0a7a3f9136c5b1e02bfa767da611d965.file
lrwxrwxrwx. 2 root root 47  1 janv.  1970 sysroot/ostree/repo/objects/1c/a63fafc9ae2c7b25dbdf4dba215d4e0a7a3f9136c5b1e02bfa767da611d965.file -> ../../../../usr/lib64/firefox/minidump-analyzer

It makes sense for this link to be broken because ostree deploys the actual files as hardlinks to their stored objects under ostree/repo.

And sure enough, in the tar version:

$ ls -i usr/lib/.build-id/91/319a27e0de0f6d6f0606ca103fa52f5ab78601 sysroot/ostree/repo/objects/1c/a63fafc9ae2c7b25dbdf4dba215d4e0a7a3f9136c5b1e02bfa767da611d965.file
56540564 sysroot/ostree/repo/objects/1c/a63fafc9ae2c7b25dbdf4dba215d4e0a7a3f9136c5b1e02bfa767da611d965.file
56540564 usr/lib/.build-id/91/319a27e0de0f6d6f0606ca103fa52f5ab78601

The '/usr/libone ends up a hardlink to the one under 'sysroot/ostree/repo, but the link is only valid under /usr/lib, and the one under sysroot is kind of expected to be broken.

So it looks to me that tar-rs is a bit to cautious there, or at least more that GNU tar is, and we could possibly relax (with an option?) tar-rs ?

cgwalters commented 1 week ago

Hi, I wrote that code (in https://github.com/ostreedev/ostree-rs-ext/ ).

Before I forget I also have some larger questions about what you're thinking of doing with that linked tool https://github.com/mripard/ocibootstrap and its overlap/intersection with http://github.com/containers/bootc and especially bootc install...but those are all side questions better done perhaps as an issue on your repository.

So it looks to me that tar-rs is a bit to cautious there, or at least more that GNU tar is, and we could possibly relax (with an option?) tar-rs ?

Without digging in deeply here I think that's possible...but it needs some deeper analysis. First, long link support here is relatively new and it's entirely possible I missed something.

Actually of important note here is that for how ostree actually uses these we don't call unpack_in...and in fact the unit test I added only re-parses the tarball.

Handling symlinks during unpacking (especially being robust against malicious tar) is a longstanding issue here (previously https://github.com/alexcrichton/tar-rs/pull/90 etc).

I will say personally my view is that the maximally safe and robust way to handle this is using e.g. https://github.com/bytecodealliance/cap-std - I thought about making that an optional dependency here.

Also I see another issue here which may be a dup of this one or related: https://github.com/alexcrichton/tar-rs/issues/369