JuliaIO / Tar.jl

TAR files: create, list, extract them in pure Julia
MIT License
80 stars 19 forks source link

Add copy_symlinks keyword to Tar.tree_hash #167

Open mkitti opened 6 months ago

mkitti commented 6 months ago

Implement copy_symlinks keyword for Tar.tree_hash

For https://github.com/JuliaLang/Pkg.jl/issues/3643 https://github.com/JuliaBinaryWrappers/P4est_jll.jl/releases/download/P4est-v2.8.1+2/P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=false)
       end
"89a337ea6f60a4fd58999ab73dea099e41032138"

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=true)
       end
"ed75b82e0dd9b53c4ac4e70376f3e6f330c72767"

For https://github.com/JuliaPackaging/Yggdrasil/issues/7888 https://github.com/JuliaBinaryWrappers/iso_codes_jll.jl/releases/download/iso_codes-v4.11.0+0/iso_codes.v4.11.0.any.tar.gz

julia> open(GzipDecompressorStream, "iso_codes.v4.11.0.any.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=false)
       end
"71f68a3d55d73f2e15a3969c241fae2349b1feb5"

julia> open(GzipDecompressorStream, "iso_codes.v4.11.0.any.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=true)
       end
"409d6ac4c02dae43ff4fe576b5c5820d0386fb3f"
codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (6269b5b) 97.28% compared to head (0bcb514) 97.64%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #167 +/- ## ========================================== + Coverage 97.28% 97.64% +0.36% ========================================== Files 4 4 Lines 810 851 +41 ========================================== + Hits 788 831 +43 + Misses 22 20 -2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nhz2 commented 6 months ago

Can the julia compat and CI tests be bumped to 1.6?

nhz2 commented 6 months ago

Adding the following test results in a stack overflow.

    tarball, io = mktemp()
    tar_write_link(io, "a", "b/q")
    tar_write_link(io, "b", "a/q")
    close(io)
    Tar.tree_hash(tarball; copy_symlinks=true)
```julia StackOverflowError: Stacktrace: [1] exec(re::Ptr{Nothing}, subject::String, offset::Int64, options::UInt32, match_data::Ptr{Nothing}) @ Base.PCRE ./pcre.jl:203 [2] exec_r_data @ Base.PCRE ./pcre.jl:220 [inlined] [3] _findnext_re(re::Regex, str::String, idx::Int64, match_data::Ptr{Nothing}) @ Base ./regex.jl:442 [4] findnext @ Base ./regex.jl:431 [inlined] [5] iterate @ Base ./strings/util.jl:556 [inlined] [6] iterate @ Base ./strings/util.jl:555 [inlined] [7] _collect(cont::UnitRange{Int64}, itr::Base.SplitIterator{String, Regex}, ::Base.HasEltype, isz::Base.SizeUnknown) @ Base ./array.jl:770 [8] collect @ ./array.jl:759 [inlined] [9] #split#487 @ ./strings/util.jl:628 [inlined] [10] split @ ./strings/util.jl:626 [inlined] [11] link_target(paths::Dict{String, Any}, path::String, link::String) @ Tar ~/github/Tar.jl/src/extract.jl:175 [12] link_target(paths::Dict{String, Any}, path::String, link::String) (repeats 13767 times) @ Tar ~/github/Tar.jl/src/extract.jl:193 [13] git_tree_hash(predicate::Function, tar::IOStream, ::Type{SHA.SHA1_CTX}, skip_empty::Bool, copy_symlinks::Bool; buf::Vector{UInt8}) @ Tar ~/github/Tar.jl/src/extract.jl:264 [14] git_tree_hash @ ~/github/Tar.jl/src/extract.jl:208 [inlined] [15] #98 @ ~/github/Tar.jl/src/Tar.jl:408 [inlined] [16] open(f::Tar.var"#98#100"{Bool, Bool, Tar.var"#1#2"}, args::String; kwargs::@Kwargs{lock::Bool}) @ Base ./io.jl:396 [17] open_nolock @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:35 [inlined] [18] arg_read @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:74 [inlined] [19] tree_hash(predicate::Function, tarball::String; algorithm::String, skip_empty::Bool, copy_symlinks::Bool) @ Tar ~/github/Tar.jl/src/Tar.jl:407 [20] tree_hash @ Tar ~/github/Tar.jl/src/Tar.jl:398 [inlined] [21] #tree_hash#102 @ Tar ~/github/Tar.jl/src/Tar.jl:425 [inlined] [22] macro expansion @ ~/github/Tar.jl/test/runtests.jl:12 [inlined] [23] macro expansion @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [24] top-level scope @ ~/github/Tar.jl/test/runtests.jl:4 [25] include(fname::String) @ Base.MainInclude ./client.jl:489 [26] top-level scope @ none:6 [27] eval @ Core ./boot.jl:385 [inlined] [28] exec_options(opts::Base.JLOptions) @ Base ./client.jl:291 [29] _start() @ Base ./client.jl:552 Test Summary: | Error Total Time copy symlinks 2 | 1 1 1.6s ERROR: LoadError: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken. in expression starting at /home/nathan/github/Tar.jl/test/runtests.jl:3 ERROR: Package Tar errored during testing ```
nhz2 commented 6 months ago

The following test results in a stack overflow from a different function:

    tarball, io = mktemp()
    tar_write_dir(io,  "foo/bar")
    tar_write_link(io, "foo/bar/b", "../a")
    tar_write_link(io, "foo/a", "bar/")
    close(io)
    Tar.tree_hash(tarball; copy_symlinks=true)
```julia StackOverflowError: Stacktrace: [1] (::Tar.var"#34#44"{Dict{String, Any}, Tar.var"#by#43"})(io::IOBuffer) @ Tar ~/github/Tar.jl/src/extract.jl:329 [2] sprint(::Function; context::Nothing, sizehint::Int64) @ Base ./strings/io.jl:114 [3] sprint @ ./strings/io.jl:107 [inlined] [4] git_object_hash(emit::Function, kind::String, ::Type{SHA.SHA1_CTX}) @ Tar ~/github/Tar.jl/src/extract.jl:347 [5] (::Tar.var"#hash_tree#42"{SHA.SHA1_CTX})(node::Dict{String, Any}) @ Tar ~/github/Tar.jl/src/extract.jl:328 [6] (::Tar.var"#34#44"{Dict{String, Any}, Tar.var"#by#43"})(io::IOBuffer) @ Tar ~/github/Tar.jl/src/extract.jl:330 --- the last 5 lines are repeated 6881 more times --- [34412] sprint(::Function; context::Nothing, sizehint::Int64) @ Base ./strings/io.jl:114 [34413] sprint @ ./strings/io.jl:107 [inlined] [34414] git_object_hash(emit::Function, kind::String, ::Type{SHA.SHA1_CTX}) @ Tar ~/github/Tar.jl/src/extract.jl:347 [34415] (::Tar.var"#hash_tree#42"{SHA.SHA1_CTX})(node::Dict{String, Any}) @ Tar ~/github/Tar.jl/src/extract.jl:328 [34416] git_tree_hash(predicate::Function, tar::IOStream, ::Type{SHA.SHA1_CTX}, skip_empty::Bool, copy_symlinks::Bool; buf::Vector{UInt8}) @ Tar ~/github/Tar.jl/src/extract.jl:338 [34417] git_tree_hash @ ~/github/Tar.jl/src/extract.jl:209 [inlined] [34418] #98 @ ~/github/Tar.jl/src/Tar.jl:408 [inlined] [34419] open(f::Tar.var"#98#100"{Bool, Bool, Tar.var"#1#2"}, args::String; kwargs::@Kwargs{lock::Bool}) @ Base ./io.jl:396 [34420] open_nolock @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:35 [inlined] [34421] arg_read @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:74 [inlined] [34422] tree_hash(predicate::Function, tarball::String; algorithm::String, skip_empty::Bool, copy_symlinks::Bool) @ Tar ~/github/Tar.jl/src/Tar.jl:407 [34423] tree_hash @ Tar ~/github/Tar.jl/src/Tar.jl:398 [inlined] [34424] #tree_hash#102 @ Tar ~/github/Tar.jl/src/Tar.jl:425 [inlined] [34425] macro expansion @ ~/github/Tar.jl/test/runtests.jl:13 [inlined] [34426] macro expansion @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined] [34427] top-level scope @ ~/github/Tar.jl/test/runtests.jl:6 [34428] include(fname::String) @ Base.MainInclude ./client.jl:489 [34429] top-level scope @ none:6 [34430] eval @ Core ./boot.jl:385 [inlined] [34431] exec_options(opts::Base.JLOptions) @ Base ./client.jl:291 Test Summary: | Error Total Time copy symlinks 2 | 1 1 59.1s ERROR: LoadError: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken. in expression starting at /home/nathan/github/Tar.jl/test/runtests.jl:3 ERROR: Package Tar errored during testing ``` <\details>
mkitti commented 6 months ago

What does extract do in this circumstance?

nhz2 commented 6 months ago

extract has a stack overflow in the first case and goes into an infinite loop in the second case.

mkitti commented 6 months ago

We should determine the correct behavior for extract first. My sense is that it should produce a more intuitive and descriptive error.

nhz2 commented 6 months ago

That makes sense, though sometimes extract will ignore symlinks that it cannot copy. This might be the same issue as https://github.com/JuliaIO/Tar.jl/issues/77 . Maybe these issues could be resolved while doing the refactoring described in https://github.com/JuliaIO/Tar.jl/issues/64

StefanKarpinski commented 6 months ago

Yeah, this is a known issue for extract: https://github.com/JuliaIO/Tar.jl/issues/77. I just never got around to fixing it because it seems deeply unlikely in real-world tarballs, but of course, we should handle it correctly. But it's very annoying to fix.