alexcrichton / tar-rs

Tar file reading/writing for Rust
https://docs.rs/tar
Apache License 2.0
630 stars 190 forks source link

Add doc hint that default is different than `tar` #366

Closed schneems closed 4 months ago

schneems commented 5 months ago

The default of tar is to include a symlink in the archive. The default of tar::Builder is to resolve symlinks and replace them with the resulting file.

This commit clarifies that difference by highlighting that follow_symlinks(true) provides the same behavior as tar -h.

Context

I ran into a problem that caused much head-scratching: calling tar resulted in much smaller file sizes than what I thought was the comparable Rust code. I created a reproduction as part of that debugging process (if you're interested): https://github.com/schneems/tar_comparison/blob/bfd420a012b46e80435cf4e7c67ca1661357fde3/README.md.

It turns out this was the issue I was facing:

$ tar -tvzf system_tar_gzip_one_operation.tar.gz | grep libruby.so
-rwxr-xr-x  0 rschneeman staff 12364984 Jun  5 16:13 lib/libruby.so.3.1.6
lrwxr-xr-x  0 rschneeman staff        0 Jun  5 16:14 lib/libruby.so.3.1 -> libruby.so.3.1.6
lrwxr-xr-x  0 rschneeman staff        0 Jun  5 16:14 lib/libruby.so -> libruby.so.3.1.6

$ tar -tvzf rust_tar_gzip_one_operation.tar.gz | grep libruby.so
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so.3.1.6
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so.3.1
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so

In the "system" archive produced by the tar cli there's one so binary and two symlinks to the same file. In the "rust" version (using tar::Builder) there are three copies of the same binary.

I previously saw that option's documentation when I was debugging my issue but didn't fully internalize the implications. By documenting what feature this is similar to in (system) tar, I hope the reader will make a connection that the two have different defaults.

schneems commented 4 months ago

Thanks!