JuliaIO / Tar.jl

TAR files: create, list, extract them in pure Julia
MIT License
80 stars 19 forks source link

problem with extracting a tar.gz file #133

Closed ThomasBreuer closed 2 years ago

ThomasBreuer commented 2 years ago

Today I observed the following problem with Julia's Tar package.

using HTTP
using Tar
using CodecZlib

# download an archive
url = "http://www.math.rwth-aachen.de/~Thomas.Breuer/atlasrep/atlasrep-2r1p0.tar.gz";
req = HTTP.request("GET", url; verbose = 0);
downl = String(req.body);
localpath = basename(url)
write(localpath, downl)

# try to extract the archive, as suggested at `https://github.com/JuliaIO/Tar.jl`
tar_gz = open(localpath)
tar = GzipDecompressorStream(tar_gz)
dir = Tar.extract(tar, "tmpdir")

The last command results in the following error message.

ERROR: unsupported entry type
Tar.Header("atlasrep/doc/chooser.html", :hardlink, 0o644, 0, "atlasrep/doc/chooser.html")
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] check_header(hdr::Tar.Header)
    @ Tar /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Tar/src/header.jl:128
  [3] read_tarball(callback::Tar.var"#26#28"{Vector{UInt8}, Bool, TranscodingStreams.TranscodingStream{GzipDecompressor, IOStream}, String}, predicate::Tar.var"#1#2", tar::TranscodingStreams.TranscodingStream{GzipDecompressor, IOStream}; buf::Vector{UInt8}, skeleton::Base.DevNull)
    @ Tar /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Tar/src/extract.jl:331
  [4] extract_tarball(predicate::Function, tar::TranscodingStreams.TranscodingStream{GzipDecompressor, IOStream}, root::String; buf::Vector{UInt8}, skeleton::Base.DevNull, copy_symlinks::Bool)
    @ Tar /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Tar/src/extract.jl:51
[...]

The archive can be unpacked without problems by tar. Is the problem in the archive?

StefanKarpinski commented 2 years ago

Seems like you're using an old version of Tar. What is your Julia version / Tar version?

ThomasBreuer commented 2 years ago

@StefanKarpinski Thanks for your interest.

I use Julia 1.6.5 together with its Tar package under share/julia/stdlib/v1.6/Tar.

Perhaps the problem is that the archive in my example has been created (some time ago) with an older tar version, since the problem does not occur with a recently created archive of a newer version (url = "http://www.math.rwth-aachen.de/~Thomas.Breuer/atlasrep/atlasrep-2.1.1.tar.gz).

Outside Julia, both archives can be unpacked with tar xvzf <archivename>.tar.gz.

StefanKarpinski commented 2 years ago

The problem is that this tar file contains a hard link entry and Julia/Tar did not add hard link support until Julia 1.7. SHowever, this tar file also has another oddity which wasn't fixed until more recently in 1.8: it includes the file atlasrep/doc/chooser.html twice and the second instance is a hard link to the first (with the same path). You can avoid this by not listing the file twice when constructing the tar file, which will produce a tar file without any hard links which will be usable by Julia 1.6.

ThomasBreuer commented 2 years ago

Thanks for inspecting this. Of course someone who creates tar archives can try to avoid certain complications. However, the point is that a general tool for dealing with tar archives cannot expect this. The problem above arose when I tried to process a list of tar archives automatically with Julia's Tar, and it turned out that Julia was apparently not (yet) the right tool for that.

StefanKarpinski commented 2 years ago

This is already fixed in Julia 1.8, which is in beta. We cannot backport support for hard links to Julia 1.6 because this is a feature and semantic versioning dictates that we not add features in patch releases. Anyone who needs to extract tarballs with hard links like this one will need use a newer Julia version.