bit-team / backintime

Back In Time - An easy-to-use backup tool for GNU Linux using rsync in the back
https://backintime.readthedocs.io
GNU General Public License v2.0
2.05k stars 203 forks source link

Files with permissions r--r--r-- being repeatedly backed up #994

Open colinl opened 5 years ago

colinl commented 5 years ago

Using version 1.2 from stable ppa, on Ubuntu 19.04, backing up to local drive, sometimes files are being written to the snapshot with the wrong permissions. This makes itself known as after adding files and doing a snapshot if another snapshot is performed then sometimes files are modified showing permission differences. I have analysed one of these and see that for the orginal snapshot the permissions in fileinfo.bz2 do not match those in the written file, so I see

$ ls -l 20190504-162212-796/backup/home/colinl/tobackup/gitrepos/bin.git/objects/08/e1c829f014423d7eaa09b2e48643f32eff54b5 
-rw-r--r-- 1 colinl colinl 185 May  4 16:20 20190504-162212-796/backup/home/colinl/tobackup/gitrepos/bin.git/objects/08/e1c829f014423d7eaa09b2e48643f32eff54b5

but

$ cat 20190504-162212-796/fileinfo.bz2 | bzip2 -d | grep e1c829
33060 colinl colinl /home/colinl/tobackup/gitrepos/bin.git/objects/08/e1c829f014423d7eaa09b2e48643f32eff54b5

and I think 33060 maps to 100444 which is r--r--r-- which is what the source file is. After running the second backup fileinfo still has 33060 but the permissions in that backup are r--r--r-- as they should be. I can make the log for the first backup available if necessary but I would rather not do it publicly. There is nothing obviously odd for the entry for that file.

colinl commented 5 years ago

Further information, it appears that it is all the files with permissions r--r--r-- that it is getting wrong, at least having looked at a selection of failing ones they all have those permissions, and I can't see any with those permissions that have not failed. I will add just one file with those permissions and see if it fails reliably... Hmm, no not that, having added just one file it went in ok.. It must be something more subtle.

colinl commented 5 years ago

Yet further information, it is to do with r--r--r-- files. It is necessary to close BiT and re-open it to trigger the problem, and in fact with those files often, but not always, if I open BiT and do a backup it will flip the permissions on those files between rw-r--r-- and r--r--r--. It is not only the first time after adding the file that it gets it wrong.

rcjhawk commented 3 years ago

This is still an issue. Running Linux Mint 20, backintime version 1.2.1. The 5TB /ext4 formatted backup disk is completely filling up, apparently because of this bug. The main disk is 2TB, and only about 35% filled, with most files unchanged for months or years.

For example, this file: -r--r--r-- 1 me me 116162688 Jan 19 2013 /home/me/pcast/phc_130119.mp3 which has not changed in most of a decade, is backed up like this:

-rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20200927-230001-204/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201115-230001-529/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20200801-090001-323/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201031-180001-991/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-200001-242/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-050001-268/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-150001-743/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-130001-788/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201120-220001-257/rest_of_path

where "rest_of_path" = home/me/pcast/phc_130119.mp3

So sometimes it works, but most times it does not.

Are there any workarounds I'm missing?

P.S. The last backup started at 1200. It's now 1354, and maybe 40% finished.

colinl commented 3 years ago

Sadly the maintainer appears to have lost interest, or suffered some awful fate, or something so unless someone else steps up to the plate then we are stuck with what we have. The only workaround that I could find was to change all the permissions on those files.

rcjhawk commented 3 years ago

I cannot now find the thread where it appeared, but adding these options in .config/backintime/config seems to do the trick

profile1.snapshots.rsync_options.enabled=true profile1.snapshots.rsync_options.value=--no-perms --no-group --no-owner

the triple --no ignoring all the permissions.

By "seems" I mean: works for at least one randomly selected file for the last week.

Disk usage is down from ~ 90% to 49% as the older backups fall off. There are still a few backups that I'd like to keep, so it's going to stay at about 50%, but since it's a 5TB drive backing up a 2TB one it's not that big a deal.

colinl commented 3 years ago

Presumably a side effect of that is that it does not save the permissions, group or owner. That may or may not be a serious issue.

rcjhawk commented 3 years ago

Executables are saved 755, other files 644. It doesn't change the owner/group for me, possibly because I'm saving it to a backup drive directory with the same ownership. It works for me, but YMMV.

ghost commented 3 years ago

I implemented the PR #1086 and dpkg-repack'ed my packages. You find them here. I've been testing the changes today and everything seems to work fine. Thanks to @b3nmore for the patch!

988 #1093

gitraphha commented 3 years ago

Hi there,

I ran into the same problem. While just ignoring permission changes helps, this is not a real solution, as already pointed out. So I'd like to add some more information, that might be of interest to determine the underlying cause. (As a sidenote: First I thought this behaviour only occurs for a non-root user, but then I saw the exact same behaviour with root, so this does have no influence)

I created a simpe testcase in dir "test":

ls -l test
insgesamt 32
drwxrwxr-x 2 someuser someuser 4096 Okt  1 12:46 ./
drwxrwxr-x 7 someuser someuser 4096 Okt  1 10:32 ../
-rw-rw-r-- 1 someuser someuser    0 Okt  1 12:46 drittens
-r--r--r-- 1 someuser someuser    0 Sep 24 21:58 stop
-r--r--r-- 1 someuser someuser    0 Sep 24 22:05 trigg

Subsequently I never made any changes to test and the files contained within. Maybe backintime or rsync or sth else does, but not me personally.

first sync works as expected:

========== Take snapshot (profile 1): Fri Oct  1 13:00:54 2021 ==========

[C] <f+++++++++ home/someuser/tmp/test/drittens
[C] <f+++++++++ home/someuser/tmp/test/stop
[C] <f+++++++++ home/someuser/tmp/test/trigg

The next (second) snapshot doesn't see any changes - so of course no snapshot is taken (as specified in the profile's options). The third snapshot then is active again:

========== Take snapshot (profile 1): Fri Oct  1 13:01:19 2021 ==========

[C] cf...p..... home/someuser/tmp/test/stop
[C] cf...p..... home/someuser/tmp/test/trigg

... and so on: fourth snapshot: no changes found. fifth snapshot:

========== Take snapshot (profile 1): Fri Oct  1 13:01:36 2021 ==========

[C] cf...p..... home/someuser/tmp/test/stop
[C] cf...p..... home/someuser/tmp/test/trigg

Hopefully this can point into an insightful direction... If s.o. with a better understanding of the code could have a look I'd really appreciate it. Thanks and best regards!

PS and adding to the previous information - this at least explains why a diff in permissions if found every other time:

###########################
# on local machine
###########################

#  snapshot realised
# stat trigg
  Datei: trigg
 Größe: 0               Blöcke: 16         EA Block: 4096   Normale leere Datei
Gerät: 3bh/59d  Inode: 16255628    Verknüpfungen: 1
Zugriff: (0444/-r--r--r--)  Uid: ( 1000/  someuser)   Gid: ( 1000/  someuser)
Zugriff: 2021-10-01 12:45:51.933391548 +0200
Modifiziert: 2021-09-24 22:05:23.054726711 +0200
Geändert: 2021-10-01 10:32:11.626954901 +0200
Geburt: -

#  snapshot NOT realised
# stat trigg
 Datei: trigg
 Größe: 0               Blöcke: 16         EA Block: 4096   Normale leere Datei
Gerät: 3bh/59d  Inode: 16255628    Verknüpfungen: 1
Zugriff: (0444/-r--r--r--)  Uid: ( 1000/  someuser)   Gid: ( 1000/  someuser)
Zugriff: 2021-10-01 12:45:51.933391548 +0200
Modifiziert: 2021-09-24 22:05:23.054726711 +0200
Geändert: 2021-10-01 10:32:11.626954901 +0200
Geburt: -

###########################
# on remote machine
# stat is applied to the file in last_snapshot
# caveat: after another snapshot has been taken, you need to "cd" using the now changed link last_snapshot to actually be in the "new" last_snapshot dir
###########################

#  snapshot realised
# stat trigg
  File: ‘trigg’
  Size: 0               Blocks: 24         IO Block: 4096   regular empty file
Device: 12h/18d Inode: 3822333     Links: 1
Access: (0444/-r--r--r--)  Uid: ( 1026/  someuser)   Gid: (  100/   users)
Access: 2021-10-01 16:18:17.000000000 +0200
Modify: 2021-09-24 22:05:23.000000000 +0200
Change: 2021-10-01 16:18:17.878002314 +0200
 Birth: -

#  snapshot NOT realised
# stat trigg
  File: ‘trigg’
  Size: 0               Blocks: 24         IO Block: 4096   regular empty file
Device: 12h/18d Inode: 3822321     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1026/  someuser)   Gid: (  100/   users)
Access: 2021-10-01 16:13:04.599002007 +0200
Modify: 2021-09-24 22:05:23.000000000 +0200
Change: 2021-10-01 16:13:04.599002007 +0200
 Birth: -
emtiu commented 2 years ago

I have encountered the same issue: File that are originally r--r--r-- are somehow getting saved up as rw-r--r-- in the snapshot, therefore triggering a new (completely unneccessary) backup copy on every subsequent snapshot.

Not sure why this happens. #1086 or #1128 are probably workarounds, but patching backintime to restore previous behavior is not exactly an elegant solution.

monsta commented 2 years ago

Well, I just found that these read-only files are changed to -rw-r--r-- in all the existing snapshots right after I delete the oldest snapshot. That's how it will lead to read-only files being backed up again when I'll make a new snapshot.

Can anyone confirm this?

monsta commented 2 years ago

It turned out that it's enough to delete any of the snapshots with read-only files - not necessarily the oldest one of them.

emtiu commented 2 years ago

I can't help thinking that this is connected to #988, but I'm not sure.

buhtz commented 1 year ago

Is anyone able to reproduce the problem and can provide the details steps for reproduction please?

I observed that behaviour in the wild, too. But I'm not able to reproduce it in a test environment using files with permissions 444 or 666 everything went fine and behaved as expected.

emtiu commented 1 year ago

In my latest VM testing, reproducing the problem seems rather straightforward:

  1. Include any r--r--r-- file in your backup.
  2. Take a number of snapshots.
  3. The r--r--r-- files will be re-backupped every second time you take a snapshot.
  4. Looking into the snapshots, you'll see that the original r--r--r-- files are changed to rw-r--r-- in all older snapshots, and re-backupped as r--r--r-- every second time a snapshot is created.

I haven't further investigated what's happening here, but clearly there's some intermediate step in the snapshotting process (involving the temporary new_snapshot dir probably?) that changes existing r--r--r-- files into rw-r--r--, leading to this behavior.

aryoda commented 1 year ago

but clearly there's some intermediate step in the snapshotting process (involving the temporary new_snapshot dir probably?)

I can test this tomorrow night via debugging perhaps I can see something obvious causing this.

BTW: Did you use any special rsync or config options to make it reproducible?

emtiu commented 1 year ago

BTW: Did you use any special rsync or config options to make it reproducible?

I think it's all very standard, with a minimal configuration running on the current dev code. I'm attaching the config I used: config.txt

  1. Include any r--r--r-- file in your backup.

Note that I included r--r--r-- files mixed in with rw-rw-r-- and other "standard" permissions, in case this makes a difference.

aryoda commented 1 year ago

Status

I could reproduce the r(w) permission ping-pong only when I delete a snapshot manually via the GUI (and most probably also automatically if configured so).

If I delete snapshots directly in the file system (without using the BiT GUI) the permission of the read-only file does not change so it must be caused by BiT.

Finding

During debugging I could identify this line of code in snapshots.py#remove() that causes the r(w) permission ping-pong when executed:

https://github.com/bit-team/backintime/blob/9b93f0d279f74089d33f5c1a47567266feb72488/common/snapshots.py#L663

The executed command in the rsync variable is looking like this:

rsync -a --delete -s /tmp/tmpweklerru/ /home/<username>/temp/testBAK_profil1/backintime/<host name>/<username>/1/20230905-222640-779

Next steps

  1. TODO: Which rsync constellation causes the r(w) ping pong?
  2. TODO: Why does BiT not just delete a snapshot folder but mount an empty tmp folder and executing above rsync command to delete a snapshot? Is this perhaps only required for some backup targets (eg. ssh)?
  3. TODO: Find a fix while minimizing the risk of breaking anything
aryoda commented 1 year ago

I suspect that the -a option of rsync is the culprit:

--archive, -a This is equivalent to -rlptgoD. It is a quick way of saying you want recur‐ sion and want to preserve almost everything. Be aware that it does not in‐ clude preserving ACLs (-A), xattrs (-X), atimes (-U), crtimes (-N), nor the finding and preserving of hardlinks (-H).

The ACLs are not preserved:

Be aware that it does not include preserving ACLs (-A) I think could mean to use the default umask instead...

Since BiT is using rsyncs --link-dest option to hard-link unchanged files between snapshots to save space deleting a snapshot may have the side effect of changing the ACL of the linked file itself and therefore for all links to this file in other snapshots.

Perhaps adding --acls and/or --perms --group --owner may preserve the original permissions...

It looks like that the introduction of the --perms --group --owner arguments in BiT v1.2.1 (see the FAQ entry) has introduced this bug which is consistent to above reports.

TODO:

mauromol commented 1 year ago
2. **TODO**: Why does BiT not just delete a snapshot folder but mount an empty tmp folder and executing above `rsync` command to delete a snapshot? Is this perhaps only required for some backup targets (eg. `ssh`)?

My 2 cents. I really don't know the rationale behind this, but if I could bet I would say that, by personal experience, deleting a snapshot by rsyncinc it with an empty folder is much faster than doing a rm -rf, especially when there are a lot of (small?) files in the snapshot. So, one possibility is that this was done for performance reasons.

emtiu commented 1 year ago

The ACLs are not preserved:

Aren't ACLs and the standard Unix permissions (like 444 = r--r--r--) two different things? :raised_eyebrow: Why would ACLs matter here? Or am I missing something basic?

aryoda commented 1 year ago

Aren't ACLs and the standard Unix permissions (like 444 = r--r--r--) two different things?

Thanks for pointing out that I am riding the wrong horse :face_with_head_bandage: ;-)

The r(w) ping pong was reported with standard Linux rights (maybe also with ACLs if installed and used but I think nobody has reported or tested this so far).

aryoda commented 1 year ago

The problem is that rsync changes the change time ("ctime") of hardlinked files in other snapshots if a hardlink is rsync --deleted in another snapshot (with ext3):

rsync delete updates change time of hardlinked read-only files making it writable again

The ctime is normally filesystem-specific and can normally not be influenced but I am wondering why the 'ctime' is not changed if I use rm instead.

I think I can create a bash script that uses just rsync to demonstrate this unwanted behavior (to ask the rsync community for guidance).

BTW: There is a 5-year-old report about the same behavior leading to duplicated files in the backup (sorry, it's German only):

https://linux.debian.user.german.narkive.com/CwQOnYfL/rsync-andert-bei-ziel-atime-und-ctime-obwohl-das-ziel-ext3-fs-mit-noatime-und-nodiratime-gemountet

aryoda commented 1 year ago

I have attached a bash-only script to make the issue 100 % reproducible:

setup.txt

Just download the file, rename it with mv setup.txt setup.sh then chmod +x setup.sh and run it with ./setup.sh.

@all Could you please provide feed back if this script does reproduce the issue on your system too?

@buhtz Do you know where we could ask the rsync community for guidance (using this script)?

buhtz commented 1 year ago

Sounds like you really want to fix this before 1.4.0 ? :smile:

I did ask the "rsync community" at their mailing list: https://lists.samba.org/mailman/listinfo/rsync The maintainer is there but do not read everything. If it is urgent or "very important" it seems OK to contact Wayne Davison directly via mail. I did this in the past in context of the "rsync argument protection" problem.

aryoda commented 1 year ago

Sounds like you really want to fix this before 1.4.0 ? 😄

I see no way to fix this for 1.4.0 (I will assign a new milestone).

I have tried all possible rsync options related to permissions and time and nothing worked. If rsync is using another kernel API than rm (I had to invest time to strace this) it would require a fix on their side.

Currently the only work-around is to give up taken snapshots with --perms.

BTW: This is a severe bug because it causes all files that are read-only for a user to be re-backuped again and again once BiT (or the user) deletes an old snapshot that contains a hardlink to the file in the latest (most-recent) snapshot and other issues report exactly this unwanted behavior (wasting storage memory).

emtiu commented 1 year ago

Now that you've nailed down the root of the problem: Do you think that #988 might have the same root cause?

emtiu commented 1 year ago

What does the ctime change have to with the change in permissions, though? :thinking:

In your example, I understand how rsync --delete changes the ctime of the stated file.

But I don't understand why its permissions jump from r--r--r-- to rw-r--r--.

aryoda commented 1 year ago

What does the ctime change have to with the change in permissions, though? 🤔

To me it is unclear if rsync changes the permission to rw so that the ctime is changed (eg. to be able to delete hardlinks) or vice versa.

rming a hardlink does not change anything (I have to retest this; edit: re-tested -> rm -rf snapshot1 does update the Change time but leaves the permissions at 444 -r--r--r--) on the other hand. So I suppose it is no kernel or fs module issue but caused by rsync... Without debugging the rsync code we can only speculate.

aryoda commented 1 year ago

Do you think that #988 might have the same root cause?

Hard to say without

What is sure: This r/w ping pong does definitely cause unnecessary file copies in each new snapshot

This may explain some of the reports in #988 but others are perhaps only affected in the first snapshot with the new version (since it will do an almost complete new copy of all source files).

Furthermore if the target file system does not support Linux permissions the file permissions may be lost in every snapshot causing a full backup in every snapshot (to be tested!).

aryoda commented 1 year ago

I have done some more scenario tests to find out if other constellations change the permissions of existing snapshot files too but it looks good.

Proposal to fix the bug

If it should be a bug in rsync we had to cope with a pre/post bug fix situation anyhow so I suggest to fix this (BiT) bug by using rm -f instead of rsync -a --delete -s to delete a snapshot.

Impact

The proposed fix

Alternatives

  1. Do not use --perms anymore

Advantages:

Disadvantages:

Next steps

  1. I will now contact the rsync community to ask for an opinion using attached script that reproduces the problem
  2. Decide on how to fix this

setup.txt

emtiu commented 1 year ago

Do not use --perms anymore

This would most probably also eliminate #988, which is another open mega-bug (that we don't understand as well as this one yet).

It's an attractive solution in my mind, but we have to weigh the consequences carefully.

aryoda commented 1 year ago

This rsync issue at least 6 years old and the bug report contains even a patch to fix this:

There is also a proposed workaround using --super but this seems unreliable to me and may have side effects:

 --super
              This  tells  the receiving side to attempt super-user activities
              even if the receiving rsync wasn't run by the super-user.  These
              activities  include:  preserving  users  via the --owner option,
              preserving all groups (not just the current user's  groups)  via
              the  --group  option,  and copying devices via the --devices op‐
              tion.  This is useful for systems  that  allow  such  activities
              without  being  the  super-user,  and also for ensuring that you
              will get errors if the receiving side isn't being run as the su‐
              per-user.  To turn off super-user activities, the super-user can
              use --no-super.
emtiu commented 1 year ago

Then maybe there's a chance to press this problem with the rsync devs? It's causing us significant "downstream trouble", after all.

emtiu commented 1 year ago

@buhtz knows more than any of us about rsync development, I think :)

buhtz commented 1 year ago

Sorry, but I have to say that currently I don't understand the details of your discussion. I'm not really into the persmission-problems.

aryoda commented 1 year ago

I have already written a question in the rsync mailing list:

https://lists.samba.org/archive/rsync/2023-September/subject.html

Depending on the feedback I will probably also bump the old open issue - I am registered user of Bugzilla for rsync now (but I don't want to "cross-post" and flood different systems with the same questions until there is no response).

IMHO even a fix at the rsync side does not guarantee that every distro and system is using the fixed rsync version so BiT has to recognize this anyhow and decide how to create and delete snapshots to avoid this issue...

BTW: rm -f seems to be significantly slower on some remote file systems (possibly Samba and NFS) and is not really a good option.

emtiu commented 1 year ago

BTW: rm -f seems to be significantly slower on some remote file systems (possibly Samba and NFS) and is not really a good option.

Thanks for leading the discussion with the rsync devs :+1:

We need to keep in mind that BackInTime also has to deal with remote locations where we can't ssh in and call rm. In those cases, rsync --delete is the only way to remove a snapshot (unless I'm missing something basic).

aryoda commented 1 year ago

We need to keep in mind that BackInTime also has to deal with remote locations where we can't ssh in and call rm

Do you mean a) "no ssh and no rm possible" or b) "no rm within ssh possible"?

For a) it would require to configure a restricted shell for sshd. I can hardly imagine a real use case requiring write permissions without rm

Did you find such a setup in the wild?

I was trying but did not succeed so far. Even the secured Hetzner Storage Box provides such a basic thing like rm (but not echo ;-).

I think we

  1. need to support rm -rf
  2. auto-fallback or add a configuration option to use rsync --delete instead (if the user should really use a restricted ssh shell)
  3. should somehow log how long it took to delete an (old) snapshot to give the user a way to recognize (too) long running snapshot deletions

Still I would prefer to keep the command dependencies on the remote-side small (= best to use only rsync without rm) but given the fact that fixed rsync will not immediately available on each system we have no other option than using rm -rf I think).

emtiu commented 1 year ago

Did you find such a setup in the wild?

I was trying but did not succeed so far. Even the secured Hetzner Storage Box provides such a basic thing like rm (but not echo ;-).

No, you're right. I was confused. A "full rsync mode" was once in development, but it never became a supported reality in BiT. We can probably rely on rm being available on a remote host, and I don't know of any particular setup where it would be missing.

I think we

  1. need to support rm -rf
  2. auto-fallback or add a configuration option to use rsync --delete instead (if the user should really use a restricted ssh shell)
  3. should somehow log how long it took to delete an (old) snapshot to give the user a way to recognize (too) long running snapshot deletions

I wonder if it's worth keeping the rsync --delete option around. It's practically useless with --perms. Which brings us back to #988, and a decision about what to do with permissions in general ;)