JuliaIO / Tar.jl

TAR files: create, list, extract them in pure Julia
MIT License
79 stars 19 forks source link

Support for extracting hard links #102

Closed StefanKarpinski closed 3 years ago

StefanKarpinski commented 3 years ago

It turns out that implementing hard links is easy: you just copy the hard link target instead of creating the file directly. The tricky part is making sure that a malicious tarball doesn't do something dangerous with a combination of hard links and symlinks, and this can get very tricky indeed. Closes #101.

I'm also including a couple of unrelated helper changes here:

codecov[bot] commented 3 years ago

Codecov Report

Merging #102 (c4b28fb) into master (3708fea) will increase coverage by 0.17%. The diff coverage is 97.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #102      +/-   ##
==========================================
+ Coverage   96.30%   96.47%   +0.17%     
==========================================
  Files           4        4              
  Lines         649      681      +32     
==========================================
+ Hits          625      657      +32     
  Misses         24       24              
Impacted Files Coverage Δ
src/Tar.jl 96.10% <ø> (ø)
src/extract.jl 96.61% <97.14%> (+0.23%) :arrow_up:
src/create.jl 97.19% <100.00%> (+0.09%) :arrow_up:
src/header.jl 94.44% <100.00%> (+0.24%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3708fea...c4b28fb. Read the comment docs.

StefanKarpinski commented 3 years ago

For posterity, here's an untested cut at checking that a hardlink is safe in full generality:

diff --git a/src/extract.jl b/src/extract.jl
index 0d72af0..8593f8a 100644
--- a/src/extract.jl
+++ b/src/extract.jl
@@ -351,6 +351,60 @@ function read_tarball(
             path = isempty(path) ? part : "$path/$part"
         end
         paths[path] = hdr.type == :symlink ? hdr.link : hdr.type
+        # check that hard links refer to already-seen files
+        if hdr.type == :hardlink
+            link_parts = split(hdr.link, '/')
+            target_parts = String[]
+            while !isempty(link_parts)
+                part = popfirst!(link_parts)
+                (isempty(part) || part == ".") && continue
+                if part == ".."
+                    isempty(target) && error("""
+                    Refusing to extract hardlink with external target
+                     * path: $(repr(hdr.path))
+                     * hardlink: $(repr(hdr.link))
+                    """)
+                    pop!(target)
+                    continue
+                end
+                target = join(target_parts, '/')
+                type = get(paths, target, nothing)
+                type === nothing && error("""
+                Refusing to extract hardlink with non-existent target
+                 * path: $(repr(hdr.path))
+                 * hardlink: $(repr(hdr.link))
+                """)
+                if !isempty(link_parts)
+                    # prefix of final link target
+                    if type == :symlink
+                        link = paths[target]
+                        link[1] == '/' && error("""
+                        Refusing to extract hardlink with absolute symlink prefix
+                         * path: $(repr(hdr.path))
+                         * hardlink: $(repr(hdr.link))
+                        """)
+                        prepend!(link_parts, split(link, '/'))
+                    elseif type != :directory
+                        error("""
+                        Refusing to extract hardlink with $type prefix
+                         * path: $(repr(hdr.path))
+                         * hardlink: $(repr(hdr.link))
+                        """)
+                    end
+                else
+                    # final link target
+                    type == :file || error("""
+                    Refusing to extract hardlink to a $type
+                     * path: $(repr(hdr.path))
+                     * hardlink: $(repr(hdr.link))
+                    """)
+                    if hdr.link != target
+                        hdr = Header(hdr; link=target)
+                    end
+                end
+            end
+        end
+        # apply callback, checking that it consumes IO correctly
         before = applicable(position, tar) ? position(tar) : 0
         callback(hdr, split(path, '/', keepempty=false))
         applicable(position, tar) || continue

I've decided that this is too complex and not worth it. For a first cut, I will add support for extracting hard links but with the restriction that they can only refer to (already-extracted) file paths. I suspect this will be what actually occurs 99.99% of the time. What the above complex logic allows for is a hard link which refers to an already-extracted file via some already-extracted symlink prefix, which would work but makes it much harder to ensure that something bad isn't happening.

maleadt commented 3 years ago

Thanks, works great. The restriction doesn't pose a problem for the tarballs I'm using.

StefanKarpinski commented 3 years ago

I would be surprised if there's any way to get command-line tar tools to create a tarball that doesn't have this property. I haven't looked at the GNU tar source but presumably it remembers the device & inode of each file it archives with hard link count larger than one, and if later while generating the same archive it encounters another file with the same device & inode, it can emit a hardlink to the earlier archive path instead. That guarantees that the hard link target is a plain file that appears in the tarball before the hard link to it.

StefanKarpinski commented 3 years ago

Finally, this works.