Tarsnap / tarsnap

Command-line client code for Tarsnap.
https://tarsnap.com
Other
861 stars 60 forks source link

Problems selectively extracting hardlinked files #18

Open cperciva opened 9 years ago

cperciva commented 9 years ago

Tarsnap currently stores hardlinked files as "link" archive entries without data. For example, if you have

# echo foo > a
# ln a b
# tarsnap -c -f foo a b

then the archive entry for 'a' will contain "foo", while the archive entry for 'b' will simply say "this is a hardlink to 'a'". Unfortunately, this doesn't interact well with selective extracts:

# tarsnap -x -f foo b
b: Can't create 'b'

This is in fact the identical behaviour to bsdtar: By the time it gets to the archive entry for 'b', it's too late to get the data from the 'a' archive entry (because we're dealing with a streaming archive format and bsdtar/libarchive are designed around that notion).

Tarsnap however should be able to do better. Since we're deduplicating, including the file data in every hardlinked archive entry is really cheap. We should probably do this.

gperciva commented 9 years ago

Check if there's a flag for libarchive to handle this for us. (might only be available in a later version?)

cperciva commented 9 years ago

Leaving a note in case I forget about it later: If we store data in every hardlinked entry, we need to make sure that we're doing the right thing when extracting, namely skipping the data if the file we're hardlinking to has been extracted already.

Jamie-Landeg-Jones commented 9 years ago

Thanks for this. Until you have time to work on this, couldn't you at least provide a more informative error message, seeing as tarsnap knows why the problem has occurred.

e.g. /usr/blah: Unable to restore hard linked file. Please restore all instances of this inode.

When I experienced this recently, I knew the file I was restoring was hard linked, so I suspected (and confirmed) that this was the issue, and restored the file successfully.

If I hadn't have known, I'd probably have assumed my backup was corrupted.

cperciva commented 9 years ago

Good point. I can't remember exactly where the hardlink extract failure occurs, but it should be straightforward to adjust the error message. @gperciva, can you track this down?

gperciva commented 9 years ago

Yes and no. Here's a first draft for discussion: https://github.com/Tarsnap/tarsnap/commit/0b84f2a41639497d8888aac8a99a0750bf2ba663, which produces:

td@gin: ~/src/tarsnap/build (warn-hard-links)
$ ./tarsnap -x -f foo b
b: Hard-link target 'a' does not exist.  Can't create 'b'
tarsnap: Error exit delayed from previous errors.

The actual error message is printed from the bottom of restore_entry(); that's the function immediately above create_filesystem_object() in libarchive/archive_write_disk.c so it's easy to see it in github. In particular, that function calls create_filesystem_object(a) multiple times (doing things like trying to create intermediate directories if the initial call fails). In the case of a hardlink to a non-existent target, it calls create_filesystem_object(a) twice before reaching the /* Everything failed; give up here. */ on line 1046.

I hesitated about including archive_clear_error(&a->archive); in create_filesystem_object(), but without that line, it would print the "Hard-link" sentence twice.

NB: the bottom of restore_entry() appends "Can't create '%s'` to the error message. I'm not sold on format of the combined message (especially without a final period!), but I don't know how much of the original libarchive code you want me to be modifying. The current patch is an attempt at being minimally invasive.

gperciva commented 9 years ago

Also: POSIX link http://pubs.opengroup.org/onlinepubs/9699919799/functions/link.html says

[ENOENT] A component of either path prefix does not exist; the file named by path1 does not exist; or path1 or path2 points to an empty string.

The patch does not check for the "component of either path not existing" case. (restore_entry() tries creating the parent dir of the link, but doesn't check the return value of that function so it could silently fail!)

gperciva commented 8 years ago

There's a few comments about attacking the root problem with hardlinked files: http://mail.tarsnap.com/tarsnap-users/msg01150.html