JuliaIO / Tar.jl

TAR files: create, list, extract them in pure Julia
MIT License
80 stars 19 forks source link

Add has_symlink public API function #166

Closed mkitti closed 6 months ago

mkitti commented 6 months ago

Implement Tar.has_symlink as requested by @staticfloat

julia> using Tar, CodecZlib, Downloads

julia> Downloads.download("https://github.com/JuliaBinaryWrappers/P4est_jll.jl/releases/download/P4est-v2.8.1+2/P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz", "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz")

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           Tar.has_symlink(io)
       end
true

julia> Downloads.download("https://github.com/JuliaBinaryWrappers/iso_codes_jll.jl/releases/download/iso_codes-v4.15.0%2B0/iso_codes.v4.15.0.any.tar.gz", "iso_codes.v4.15.0.any.tar.gz")

julia> open(GzipDecompressorStream, "iso_codes.v4.15.0.any.tar.gz") do io
           Tar.has_symlink(io)
       end
true

julia> Tar.create("empty", "foo.tar")
"foo.tar"

julia> Tar.has_symlink("foo.tar")
false

xref: https://github.com/JuliaLang/Pkg.jl/issues/3643#issuecomment-1880111897

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (6269b5b) 97.28% compared to head (17a81a3) 97.55%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #166 +/- ## ========================================== + Coverage 97.28% 97.55% +0.26% ========================================== Files 4 4 Lines 810 817 +7 ========================================== + Hits 788 797 +9 + Misses 22 20 -2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mkitti commented 6 months ago

@nhz2 's implementation looks good as well: https://gist.github.com/nhz2/2db880497293868178225a964c68fd99#file-find-symlinks-jl-L12-L19

StefanKarpinski commented 6 months ago

This feels like it could be implemented externally since it's quite a simple application of Tar.list. In the calling code it would simply be this:

has_symlink = false
Tar.list(tarball) do hdr
    has_symlink |= hdr.type == :symlink
end

If anything, I'd be inclined to add a generic predicate checker like this:

Tar.check_any(predicate::Function, tarball::ArgRead)

That checks if any entry in the tarball satsifies a predicate; it could be specialized with a type::Symbol that would check if any of the entries in the tarball are of that type.

mkitti commented 6 months ago

How would one write a function without getting into the non-public details of the header structure?

If islink worked, Tar.list(islink, mytar.tar) I would consider it

Also I don't get the need to |, or, the results together. What we actually need is short circuting. Once we found the first symlink, we're done.

StefanKarpinski commented 6 months ago

The Header struct is public and documented and part of several APIs, see https://github.com/JuliaIO/Tar.jl#tarheader

StefanKarpinski commented 6 months ago

We could add short-circuiting as an optimization if we have a Tar.contains function, but I'd consider that an implementation detail.

mkitti commented 6 months ago

How about we define Base.islink(h::Header) = h.type == :symlink?

mkitti commented 6 months ago

Here is the API I am imagining.

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           findfirst(islink, Tar.list(io)) != nothing
       end
true

julia> Tar.create(mkdir("empty"), "foo.tar")
"foo.tar"

julia> findfirst(islink, Tar.list("foo.tar")) != nothing
false

For the short-circuiting what I really want is a lazy iterator over the headers rather than the eager Vector{Header} from Tar.list.

StefanKarpinski commented 6 months ago

I've made a PR for the predicates on Header types: https://github.com/JuliaIO/Tar.jl/pull/169/files. Making an iterable object that does what iterate_headers does seems quite annoying. The alternative would be to use the callback interface and raise an exception to terminate early; ugly but would avoid needing an iterator. In the mean time, I think that doing any(islink, Tar.list(tarball)) is fine.

StefanKarpinski commented 6 months ago

Please also note that none of this is going to help the Pkg situation since I don't think we want to add faetures to Tar in a point release and a point release is what is where this issue is going to be fixed.