borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.95k stars 740 forks source link

I am seeing ‘A’ (added) status for an unchanged file!? - And I did read the FAQ #3492

Closed henfri closed 6 years ago

henfri commented 6 years ago

I was wondering, why my backup took so long -although I ran a complete backup yesterday. Thus, I have increased the verbosity, and I see many (all?) files with the "A" flag. I did some reading and quickly found this:

So, I checked one of the files mtime with this little script:

filemtime=`stat -c %Y $1`
currtime=`date +%s`
diff=$(( (currtime - filemtime) / 86400 ))
echo $diffroot
bash mtime.sh /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
4371

So, this file is quite old. Nevertheless, I created a new file touch /srv/DataPool/Dokumente/test.txt

TL;DR Skip to the bottom (marked again by TL;DR)

Still: Many many old files are getting the "A" flag.

The command I run is this:

borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-documents-{now} 
/srv/DataPool/Dokumente /srv/DataPool/Fotos /srv/DataPool/HomeVideo /srv/DataPool/Musik /srv/DataPool/Paperless /root /home /etc 
--exclude-from /etc/borgmatic/excludes --exclude-from /tmp/tmpgiptutgu --exclude-caches --exclude-if-present .nobackup 
--compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats

(line-breaks added for better readability)

borg lists gives me:

borg list borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup  | grep documents
homeserver-documents-2017-10-22T18:00:30 Sun, 2017-10-22 18:00:34 [2145c2bb752c426204eacf858ed7f4183c89de878b5781a91544f8e167b0538e]
homeserver-documents-2017-10-27T20:35:37 Fri, 2017-10-27 20:35:41 [66ca5582e7284d42f20636b463148ca2ef890580b609f62dbcd2fcfb9ff3c0b2]
homeserver-documents-2017-10-28T22:55:45 Sat, 2017-10-28 22:55:49 [7c7f66f60e77bf45acc02f08a9666a74f3d9f561cdc5bb90a10084d9a1678bf4]
homeserver-documents-2017-12-27T12:38:44 Wed, 2017-12-27 12:38:53 [f71a36983813c9fd0c866f3e5fc175a45887315ce77b1eba09a174c66f3048dd]
homeserver-documents-2017-12-27T13:02:24 Wed, 2017-12-27 13:02:33 [1cf7803186b9d851690770aab96abea404871b473841024bff404dc80035c65b]

Next, I do:

borg mount borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup  /mnt/test
cd /mnt/test/homeserver-documents-2017-12-27T13:02:24

and a ls -l of both files:

-rwxrwxrwx 1 henfri users 65616 Jan  8  2006 /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
-rwxrwxrwx 1 henfri users 65616 Jan  8  2006 ./srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

and the mtime of both files:

root@homeserver:/mnt/test/homeserver-documents-2017-12-27T13:02:24# bash ~/mtime.sh /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
4371
root@homeserver:/mnt/test/homeserver-documents-2017-12-27T13:02:24# bash ~/mtime.sh ./srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
4371

Also the md5sum of both files is identical:

40b7f91463627b9fa04bfe453ad5b69a  /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
40b7f91463627b9fa04bfe453ad5b69a  ./srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

What am I missing?

I did now check another folder: borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-etc /etc --exclude-from /etc/borgmatic/excludes --exclude-from /tmp/tmpgiptutgu --exclude-caches --exclude-if-present .nobackup --compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats This backs up only /etc (which is also part of a previous backup). Here, many files are correctly identified as 'U'.

Why does this work properly on /etc/ but not on /srv/DataPool/Dokumente/? Well, there is one speciality... DataPool is a mergerfs Pool.

So, I tested the example file in isolation: borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-test1 /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf --exclude-from /etc/borgmatic/excludes --exclude-caches --exclude-if-present .nobackup --compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats

This time, I got a lot of these message -not sure why:

Fetching and building archive index for 'homeserver-2017-11-05-190701' ...
Merging into master chunks index ...
Reading cached archive chunk index for 'homeserver-2017-07-05-190702' ...
Verified integrity of /root/.cache/borg/f9ddbe8a3610937e192017f636270c8c95254bcd767c9959c18cbfc9c85a6d0a/chunks.archive.d/b5e06201e90d2051a49da70a0e8bc9fa0624df0feae6b90073ec49e3f757f676.compact
Merging into master chunks index ...

But nevertheless, again I got: A /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf So next I try with the underlying filesystem (looking 'below' the mergerfs): (only the path to the file is changed: /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf)

TL;DR just read from here: Summary: when backing up /etc/ files are correctly identified as unchanged. When backing up /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf (or other files on this file system) the files are recognized as 'A')

So now I try: borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-test2 /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf --exclude-from /etc/borgmatic/excludes --exclude-caches --exclude-if-present .nobackup --compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats

No change: A /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

If I repeat the last command (just incrementing the archive name to test3) I still get "A".

So, I suspect it has something to do with the file system: Here the output of the mount command: /dev/sdd on /media/678c601d-8d2e-4be0-9226-d20a6f772f8b type ext4 (rw,noexec,relatime,data=ordered,jqfmt=vfsv0,usrjquota=aquota.user,grpjquota=aquota.group)

What can be the reason?

Greetings, Hendrik

ThomasWaldmann commented 6 years ago

Read the FaQ about this?

On December 28, 2017 11:22:48 AM GMT+01:00, henfri notifications@github.com wrote:

I was wondering, why my backup took so long -although I ran a complete backup yesterday. Thus, I have increased the verbosity, and I see many (all?) files with the "A" flag. I did some reading and quickly found this:

So, I checked one of the files mtime with this little script:

filemtime=`stat -c %Y $1`
currtime=`date +%s`
diff=$(( (currtime - filemtime) / 86400 ))
echo $diffroot
bash mtime.sh /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
4371

So, this file is quite old. Nevertheless, I created a new file touch /srv/DataPool/Dokumente/test.txt

TL;DR Skip to the bottom (marked again by TL;DR)

Still: Many many old files are getting the "A" flag.

The command I run is this:

borg create
borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-documents-{now}

/srv/DataPool/Dokumente /srv/DataPool/Fotos /srv/DataPool/HomeVideo
/srv/DataPool/Musik /srv/DataPool/Paperless /root /home /etc 
--exclude-from /etc/borgmatic/excludes --exclude-from /tmp/tmpgiptutgu
--exclude-caches --exclude-if-present .nobackup 
--compression lz4 --one-file-system --remote-path borg --umask 77
--debug --list --stats

(line-breaks added for better readability)

borg lists gives me:

borg list
borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup
| grep documents
homeserver-documents-2017-10-22T18:00:30 Sun, 2017-10-22 18:00:34
[2145c2bb752c426204eacf858ed7f4183c89de878b5781a91544f8e167b0538e]
homeserver-documents-2017-10-27T20:35:37 Fri, 2017-10-27 20:35:41
[66ca5582e7284d42f20636b463148ca2ef890580b609f62dbcd2fcfb9ff3c0b2]
homeserver-documents-2017-10-28T22:55:45 Sat, 2017-10-28 22:55:49
[7c7f66f60e77bf45acc02f08a9666a74f3d9f561cdc5bb90a10084d9a1678bf4]
homeserver-documents-2017-12-27T12:38:44 Wed, 2017-12-27 12:38:53
[f71a36983813c9fd0c866f3e5fc175a45887315ce77b1eba09a174c66f3048dd]
homeserver-documents-2017-12-27T13:02:24 Wed, 2017-12-27 13:02:33
[1cf7803186b9d851690770aab96abea404871b473841024bff404dc80035c65b]

Next, I do:

borg mount
borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup
/mnt/test
cd /mnt/test/homeserver-documents-2017-12-27T13:02:24

and a ls -l of both files:

-rwxrwxrwx 1 henfri users 65616 Jan  8  2006
/srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
-rwxrwxrwx 1 henfri users 65616 Jan  8  2006
./srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

and the mtime of both files:

root@homeserver:/mnt/test/homeserver-documents-2017-12-27T13:02:24#
bash ~/mtime.sh
/srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
4371
root@homeserver:/mnt/test/homeserver-documents-2017-12-27T13:02:24#
bash ~/mtime.sh
./srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
4371

Also the md5sum of both files is identical:

40b7f91463627b9fa04bfe453ad5b69a 
/srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
40b7f91463627b9fa04bfe453ad5b69a 
./srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

What am I missing?

I did now check another folder: borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-etc /etc --exclude-from /etc/borgmatic/excludes --exclude-from /tmp/tmpgiptutgu --exclude-caches --exclude-if-present .nobackup --compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats This backs up only /etc (which is also part of a previous backup). Here, many files are correctly identified as 'U'.

Why does this work properly on /etc/ but not on /srv/DataPool/Dokumente/? Well, there is one speciality... DataPool is a mergerfs Pool.

So, I tested the example file in isolation: borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-test1 /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf --exclude-from /etc/borgmatic/excludes --exclude-caches --exclude-if-present .nobackup --compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats

This time, I got a lot of these message -not sure why:

Fetching and building archive index for 'homeserver-2017-11-05-190701'
...
Merging into master chunks index ...
Reading cached archive chunk index for 'homeserver-2017-07-05-190702'
...
Verified integrity of
/root/.cache/borg/f9ddbe8a3610937e192017f636270c8c95254bcd767c9959c18cbfc9c85a6d0a/chunks.archive.d/b5e06201e90d2051a49da70a0e8bc9fa0624df0feae6b90073ec49e3f757f676.compact
Merging into master chunks index ...

But nevertheless, again I got: A /srv/DataPool/Dokumente/Hendrik/Dokumente/sioux_city1.pdf So next I try with the underlying filesystem (looking 'below' the mergerfs): (only the path to the file is changed: /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf)

TL;DR just read from here: Summary: when backing up /etc/ files are correctly identified as unchanged. When backing up /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf (or other files on this file system) the files are recognized as 'A')

So now I try: borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-test2 /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf --exclude-from /etc/borgmatic/excludes --exclude-caches --exclude-if-present .nobackup --compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats

No change: A /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

If I repeat the last command (just incrementing the archive name to test3) I still get "A".

So, I suspect it has something to do with the file system: Here the output of the mount command: /dev/sdd on /media/678c601d-8d2e-4be0-9226-d20a6f772f8b type ext4 (rw,noexec,relatime,data=ordered,jqfmt=vfsv0,usrjquota=aquota.user,grpjquota=aquota.group)

What can be the reason?

Greetings, Hendrik

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/borgbackup/borg/issues/3492

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

henfri commented 6 years ago

@ThomasWaldmann Did you read the title of this issue? :-)

To answer your question: yes, I did! And the mtime of the file I use for the test is several years old.

ThomasWaldmann commented 6 years ago

Ehrm, no, obviously I only read the (github email) body.

ThomasWaldmann commented 6 years ago

Please specify the borg version. For mergerfs mount, use use_ino option.

Also, always use the same path for your mount. borg identifies files in the files cache by the hash of their full path. So if you change the mountpoint, the cached information won't be found.

henfri commented 6 years ago

Hello Thomas,

thanks for your help. The Borg-Version is 1.1.2. I have not changed the path. I have not used the use_ino option. I did add that now (fstab) and re-booted. The option is not shown as output of the mount command though (?)

Then I tried my test: borg create borg@openmediavault:/srv/_dev_disk_by-id_ata-ST5000DM000-1FK178_W4J1K7R1/borgbackup1/borgbackup::{hostname}-test2 /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf --exclude-from /etc/borgmatic/excludes --exclude-caches --exclude-if-present .nobackup --compression lz4 --one-file-system --remote-path borg --umask 77 --debug --list --stats

Still and again I get A /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf Do you have a further hint?

Greetings, Hendrik

ThomasWaldmann commented 6 years ago

You need to run 2 backups and look at the output of the second after adding the use_ino.

henfri commented 6 years ago

Hello Thomas,

that's what I did. To make sure, I did try that again. Still I get the "A" for the file that I have added now multiple times.

What's still odd though:

I have not changed the path. I have not used the use_ino option. I did add that now (fstab) and re-booted. The option is not shown as output of the mount command though (?)

Shouldn't mount list that this option was actually used?

Greetings, Hendrik

henfri commented 6 years ago

See also https://github.com/trapexit/mergerfs/issues/460

ThomasWaldmann commented 6 years ago

if the files-cache is expecting stable inodes (this is the default, see --files-cache) and your fs does not have stable inodes, "A" status all the time is expected.

So, why didn't you use use_ino? Or did you use it and is it not working? check with ls -i.

henfri commented 6 years ago

Hello Thomas,

I understand that.

I did not use use_ino in the past, because I was not aware. Now I do use it, but it is not displayed in the options according to https://github.com/trapexit/mergerfs/issues/460

Now, to ensure that it is no issue with mergerfs, I can try to backup the file from the underlying, actual (ext4) filesystem (in fact, I did that). Still the same issue.

ls -i shows in this case:

root@homeserver:/srv/DataPool# ls -i /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
30415538 /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
root@homeserver:/srv/DataPool# ls -i /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
30415538 /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

and in the mergerfs case:

root@homeserver:/srv/DataPool# ls -i /media/22cd0c22-c75b-432a-b92b-73229607a8bd/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
9002281867954 /media/22cd0c22-c75b-432a-b92b-73229607a8bd/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
root@homeserver:/srv/DataPool# ls -i /media/22cd0c22-c75b-432a-b92b-73229607a8bd/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
9002281867954 /media/22cd0c22-c75b-432a-b92b-73229607a8bd/Dokumente/Hendrik/Dokumente/sioux_city1.pdf

So, to me it looks like the inodes are stable, no?

Greetings, Hendrik

ThomasWaldmann commented 6 years ago

I don't think you'll see instable inodes that way, but rather when there was some fs activitity in between AND a umount / mount.

henfri commented 6 years ago

Ok.

Is this an appropriate test?

 # ls -i /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
30415538 /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
 # touch /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/test
 # umount -l /media/678c601d-8d2e-4be0-9226-d20a6f772f8b
 # mount /media/678c601d-8d2e-4be0-9226-d20a6f772f8b
 # ls -i /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
30415538 /media/678c601d-8d2e-4be0-9226-d20a6f772f8b/Dokumente/Hendrik/Dokumente/sioux_city1.pdf
henfri commented 6 years ago

P.S: can we determine the inode that borg saved for this file?

ThomasWaldmann commented 6 years ago

files_cache[H(fullpath)] -> file_entry (contains ctime, inode, size, chunkids)

henfri commented 6 years ago

Hm, I hoped for something I can do. C or C++ is not my world. Python would be. Do you have an alternative to progress on this?

Thanks for your help!

ThomasWaldmann commented 6 years ago

That stuff is used from python.

Progress would be "get your inodes working or ignore them".

henfri commented 6 years ago

Hello Thomas,

I am suprised that you are so sure that it is caused by the inodes. Especially, because the problem appared also on the underlying filesystem (i.e. without mergerfs).

I have now tried ignoring the inodes -on an ext4 drive. Same problem. Thus, I figured it is due to the mtime (size does not change). And in fact: As I was always only backing um this one(!) file, it was always automatically the newest. Backing up one folder, it turned out that everything works as expected:

--files-cache=mtime,size
U /root/testdir/borg-linux64
A /root/testdir/abba
U /root/testdir/borg_mount.sh
U /root/testdir/borgmount.sh
U /root/testdir/certbot.log
d /root/testdir

The newest file is Added, all others are unchanged.

As a conclusion: I had two problems: 1) use_inode was missing in mergerfs 2) my test after use_inode was not a good test, as it was backing up one file only. And if borg backs up one file only, it will always be "A". But not due to inode but due to mtime.

Greetings, Hendrik

ThomasWaldmann commented 6 years ago

If there is an option use_ino, but you do not use it (initially), it was quite clear that this breaks the unchanged file detection. Later you use it, but it seemed to have issues as you have found.

Your very early said you have read the FAQ about "A" status, so not sure why this took so long.

henfri commented 6 years ago

Because we focused on inode, not on mtime.

trapexit commented 6 years ago

I'm not familiar with borg but if its using mtime of directories in order to decide anything you need to be sure to set func.getattr=newest when using mergerfs else the data returned for a stat will be whatever the first drive listed (by default).

henfri commented 6 years ago

Thanks for that advice. I was not aware!

I am not sure I understand the feature fully, though. Could you please elaborate on this?

Remember though that this is just stat. If the file is later open'ed or unlink'ed and the policy is different for those then a completely different file or directory could be acted on.

Greetings, Hendrik

trapexit commented 6 years ago

You have to read the section on policies. Each filesystem function can have independent policies. Generally you want them all the same but some cases its useful. All that line is saying is that... remember you can have different policies for different functions. Just because you set getattr = newest doesn't mean that open or unlink will behave the same unless you use the same policy.

mergerfs is providing a union. You can have multiple files or directories at the same relative path. If getattr = newest then it means when you stat the data for the newest file/dir is returned. But if you have open = ff then you'd open the first file found based on the list of drives.

henfri commented 6 years ago

Hello,

I have added func.getattr=newest. All works like expected now. The backup of ~10Gb now takes 5 minutes over WAN. Brilliant!

Greetings, Hendrik