alexcrichton / tar-rs

Tar file reading/writing for Rust
https://docs.rs/tar
Apache License 2.0
625 stars 184 forks source link

Unpacking tar archive with long link in it results in truncated filenames #369

Open schneems opened 3 months ago

schneems commented 3 months ago

Context

I'm using tar to extract the contents downloaded from https://repo1.maven.org/maven2/org/jruby/jruby-dist/9.4.8.0/jruby-dist-9.4.8.0-bin.tar.gz for re-packaging it and uploading it to S3 https://github.com/heroku/docker-heroku-ruby-builder/blob/9e64b4401be4df7c158c9253290f6a3248927023/shared/src/lib.rs#L24-L31.

Unfortunately I've learned that this file uses ././@LongLink and it seems that the rust tar archive truncates the file by default. To demonstrate the issue I made a reproduction https://github.com/schneems/tar_long_link_repro.

Reproduction

$ git clone https://github.com/schneems/tar_long_link_repro
$ cd tar_long_link_repro

Then run it:

$ cargo run

Expected

I expect that tmp/rust-extracted/jruby-9.4.8.0/lib/ruby/stdlib/bundler/vendor/molinillo/lib/molinillo/delegates will contain a file specification_provider.rb.

Actual

It does not, the command fails an assertion:

$ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/tar_long_link_repro`
thread 'main' panicked at src/main.rs:32:5:
expected ["resolution_state.rb", "specification_provide"] to include "specification_provider.rb" but it did not
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Note the filename seems truncated:

$ ls tmp/rust-extracted/jruby-9.4.8.0/lib/ruby/stdlib/bundler/vendor/molinillo/lib/molinillo/delegates
resolution_state.rb specification_provide

More

I see there that the header and entry inside of tar are aware of the concept of long names (https://docs.rs/tar/0.4.41/tar/struct.Header.html?search=long), but I'm not sure how to unpack a Gnu tar file that has them. If the feature exists, we could possibly update the documentation to make it clearer.

headius commented 3 months ago

I found several reports online of problems with the maven-assembly-plugin or the plexus-archiver that it seems to use to produce a tarball, but nothing exactly matches this. The fact that other tar libraries work properly would seem to implicate tar-rs. However, the @LongLink appears to be a gnu tar extension, which could mean it is simply a problematic gnu-specific feature, or it could mean that the maven plugins for tarballs are doing something unusual or incorrect with it.

It would not seem to be a JRuby bug, since we don't do the tarball construction ourselves, but if there's a problem with an upstream plugin or some configuration change needed to omit @LongLink entries, it will be necessary to patch JRuby's build. If you are able to come up with a reproduction that implicates the JRuby tarball as the source of problems, I would recommend opening a JRuby issue and we can help investigate further.