Tarsnap / tarsnap

Command-line client code for Tarsnap.
https://tarsnap.com
Other
865 stars 60 forks source link

tarsnap can be tricked into skipping file using "touch -r" #542

Open safinaskar opened 2 years ago

safinaskar commented 2 years ago

As well as I understand tarsnap takes into account file metadata when deciding whether to skip file. Unfortunately, this means that tarsnap can wrongly skip modified file if its metadata was changed in unusual way. I was able to change file metadata using touch -r and trick tarsnap into not making backing. Here is full log (b is file with randomly generated content):

user@comp:/tmp$ cp -a b c
user@comp:/tmp$ time -p sudo tarsnap -c -f t-2022-05-11-9 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key c
tarsnap: An archive already exists with the name "t-2022-05-11-9"
tarsnap: Error creating new archive
tarsnap: Error exit delayed from previous errors.
real 3,60
user 0,01
sys 0,02
user@comp:/tmp$ time -p sudo tarsnap -c -f t-2022-05-11-10 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key c
                                       Total size  Compressed size
All archives                             92264085         83734204
  (unique data)                          50157877         41490716
This archive                              8395732          8436153
New data                                      516             1108
real 10,53
user 0,16
sys 0,01
user@comp:/tmp$ dd if=/dev/urandom of=c bs=8M count=1
1+0 records in
1+0 records out
8388608 bytes (8,4 MB, 8,0 MiB) copied, 0,274159 s, 30,6 MB/s
user@comp:/tmp$ touch --reference=b c
user@comp:/tmp$ time -p sudo tarsnap -c -f t-2022-05-11-11 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key c
                                       Total size  Compressed size
All archives                            100659817         92170357
  (unique data)                          50158393         41491824
This archive                              8395732          8436153
New data                                      516             1108
real 10,11
user 0,04
sys 0,01
user@comp:/tmp$ mkdir cq
user@comp:/tmp$ cd cq
user@comp:/tmp/cq$ sudo tarsnap -x -f t-2022-05-11-11 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key
user@comp:/tmp/cq$ ls
c
user@comp:/tmp/cq$ md5sum /tmp/cq/c
8a7b571bdcee199bf1a4c5636d8a96cc  /tmp/cq/c
user@comp:/tmp/cq$ md5sum /tmp/cq/c /tmp/c
8a7b571bdcee199bf1a4c5636d8a96cc  /tmp/cq/c
7c95ce7a821a197afc0942d63f9c7407  /tmp/c
user@comp:/tmp/cq$ cd ..
user@comp:/tmp$ time -p sudo tarsnap -c -f t-2022-05-11-12 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key c
[sudo] password for user: 
tarsnap: Cannot start write transaction: Account balance is not positive.
tarsnap: Please add more money to your tarsnap account
tarsnap: Error creating new archive
tarsnap: Error exit delayed from previous errors.
real 8,88
user 0,03
sys 0,00
user@comp:/tmp$ time -p sudo tarsnap -c -f t-2022-05-11-12 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key c
                                       Total size  Compressed size
All archives                            109055549        100606510
  (unique data)                          50158909         41492932
This archive                              8395732          8436153
New data                                      516             1108
real 9,71
user 0,03
sys 0,01
user@comp:/tmp$ mkdir cq2
user@comp:/tmp$ cd cq2
user@comp:/tmp/cq2$ sudo tarsnap -x -f t-2022-05-11-12 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key
user@comp:/tmp/cq2$ md5sum /tmp/cq2/c /tmp/c
8a7b571bdcee199bf1a4c5636d8a96cc  /tmp/cq2/c
7c95ce7a821a197afc0942d63f9c7407  /tmp/c

I use tarsnap 1.0.40.

So, it seems any future tarsnap invocations will not backup new c version.

This breaks my workflow, because I actually sometimes use command touch -r. Also I think it is possible some broken utils may create files with wrong metadata. And I want my backup software to be absolutely reliable in such cases.

So, please always checksum files or add some option to always force checksumming. rsync has such option, it is named --checksum

gperciva commented 2 years ago

I'm curious as to why you use touch -r. You're deliberately modifying a file, then telling the system to pretend that it hasn't been modified by setting the mtime to the previous value?

Anyway, you're correct that tarsnap will avoid reading a file whose inode numbers, size, and mtime hasn't changed, as noted on https://www.tarsnap.com/efficiency.html.

If you want to disable the cache entirely, you could use the (admittedly non-obviously-named) --verylowmem option.

(This issue arose in another context a few months ago, so I was considering adding some documentation about this.)

safinaskar commented 2 years ago

I'm curious as to why you use touch -r. You're deliberately modifying a file, then telling the system to pretend that it hasn't been modified by setting the mtime to the previous value?

Yes. I often record video from screen using ffmpeg -f x11grab and similar things. Sometimes I want to recode video using better compression. And I intentionally keep metadata the same using touch -r, so that I could see that the video was created a long ago.

If you want to disable the cache entirely, you could use the (admittedly non-obviously-named) --verylowmem option.

I tried to find such option. I opened page https://www.tarsnap.com/man-tarsnap.1.html , then using "find in page" I searched strings "checksum", "mtime" and "ctime" and didn't find anything. It would be great if you add to man to --verylowmem something like: "this option doesn't look at mtime".

There are programs, which manipulate mtime. I know at least one of them: xz. I think there are a lot more. xz creates compressed file with same mtime as source file. I can create scenario, where this can be exploited to trick tarsnap. Here is full log (I use tarsnap 1.0.40, linux kernel 4.19, ext4 file system, this is easy to get same inode number on ext4):

user@comp:~/exp$ echo a > a
user@comp:~/exp$ xz -0 a
user@comp:~/exp$ stat a.xz 
  File: a.xz
  Size: 60          Blocks: 8          IO Block: 4096   regular file
Device: 81ah/2074d  Inode: 33988611    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/    user)   Gid: ( 1000/    user)
Access: 2022-05-12 23:41:37.429926508 +0300
Modify: 2022-05-12 23:41:37.429926508 +0300
Change: 2022-05-12 23:41:39.305922434 +0300
 Birth: -
user@comp:~/exp$ md5sum a.xz 
0da74eb254586cec9888dd700905795c  a.xz
user@comp:~/exp$ sudo tarsnap -c -f t-2022-05-11-18 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key a.xz
                                       Total size  Compressed size
All archives                            142643279        134351393
  (unique data)                          58562759         49934797
This archive                                 2599             1872
New data                                     2599             1872
user@comp:~/exp$ xz -d a.xz 
user@comp:~/exp$ xz -1 a
user@comp:~/exp$ stat a.xz 
  File: a.xz
  Size: 60          Blocks: 8          IO Block: 4096   regular file
Device: 81ah/2074d  Inode: 33988611    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/    user)   Gid: ( 1000/    user)
Access: 2022-05-12 23:41:44.857910380 +0300
Modify: 2022-05-12 23:41:37.429926508 +0300
Change: 2022-05-12 23:42:48.873771880 +0300
 Birth: -
user@comp:~/exp$ md5sum a.xz 
fa25f1983bf7153c34a11ca1edf964dc  a.xz
user@comp:~/exp$ sudo tarsnap -c -f t-2022-05-11-19 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key a.xz
                                       Total size  Compressed size
All archives                            142645878        134353265
  (unique data)                          58563278         49935908
This archive                                 2599             1872
New data                                      519             1111
user@comp:~/exp$ mkdir extract
user@comp:~/exp$ cd extract/
user@comp:~/exp/extract$ sudo tarsnap -x -f t-2022-05-11-19 --cachedir /usr/local/tarsnap-cache-2 --keyfile /root/tarsnap-2.key
user@comp:~/exp/extract$ md5sum ~/exp/a.xz ~/exp/extract/a.xz
fa25f1983bf7153c34a11ca1edf964dc  /home/user/exp/a.xz
0da74eb254586cec9888dd700905795c  /home/user/exp/extract/a.xz

So, I think current tarsnap behavior is simply wrong, tarsnap could easily be tricked using such innocent tools as xz. So (from correctness point of view) default tarsnap behavior should be changed to always use checksums. But I think this will cause performance degradation, so I propose to keep default behavior as is but to add option to always use checksum and tell everyone to nearly always use it. And of course if file system has its own checksumming or some kind of integrity checking, then tarsnap should use it

cperciva commented 2 years ago

I'm tempted to say "don't do that then" -- assuming that "unmodified inode" means "unmodified file" is very common, and as you note it provides a very large performance benefit. In your case of recompressing a video, tarsnap will do the right thing anyway, since in addition to the modification time it also checks if the file size has changed.

But yes, we could add an --always-read-files option. This would actually be faster than relying on --verylowmem since tarsnap can compare data against the chunkification cache and bypass the CPU-intensive chunking step.

safinaskar commented 2 years ago

I found a perfect solution! Just always use ctime instead of mtime. I was not able to find way to set ctime to some past date. And modern versions of borg-backup use ctime instead of mtime for this purpose, here is their rationale: https://borgbackup.readthedocs.io/en/stable/usage/create.html#borg-create

skull-squadron commented 1 year ago

Wontfix would be my vote.

UNIX™ file management and backup philosophy would lean to using the filesystem as the ultimate source of truth.

if (file_now->nodump) return NOPE;
if (file_now->mtime_in_tai64 <= last->mtime_in_tai64) return NOPE;
if (file_now->size == last->size) {
  file_now->hash = hash(file_now);
  if (file_now->hash == last->hash)
    return NOPE;
}

return OK_FINE_I_WILL_BACK_THIS_UP;

To mitigate this category of edge-case, it would require exhaustively scanning every byte of content &| hooking fs change notifications for continuous/periodic sync background tracking, both very expensive in different ways compared to the added value. ROI approaching epsilon.