Open colinl opened 5 years ago
Further information, it appears that it is all the files with permissions r--r--r-- that it is getting wrong, at least having looked at a selection of failing ones they all have those permissions, and I can't see any with those permissions that have not failed. I will add just one file with those permissions and see if it fails reliably... Hmm, no not that, having added just one file it went in ok.. It must be something more subtle.
Yet further information, it is to do with r--r--r-- files. It is necessary to close BiT and re-open it to trigger the problem, and in fact with those files often, but not always, if I open BiT and do a backup it will flip the permissions on those files between rw-r--r-- and r--r--r--. It is not only the first time after adding the file that it gets it wrong.
This is still an issue. Running Linux Mint 20, backintime version 1.2.1. The 5TB /ext4 formatted backup disk is completely filling up, apparently because of this bug. The main disk is 2TB, and only about 35% filled, with most files unchanged for months or years.
For example, this file: -r--r--r-- 1 me me 116162688 Jan 19 2013 /home/me/pcast/phc_130119.mp3 which has not changed in most of a decade, is backed up like this:
-rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20200927-230001-204/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201115-230001-529/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20200801-090001-323/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201031-180001-991/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-200001-242/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-050001-268/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-150001-743/rest_of_path -rw-r--r-- 2 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201122-130001-788/rest_of_path -rw-r--r-- 1 me me 116162688 Jan 19 2013 /backup/me/backintime/mc/me/1/20201120-220001-257/rest_of_path
where "rest_of_path" = home/me/pcast/phc_130119.mp3
So sometimes it works, but most times it does not.
Are there any workarounds I'm missing?
P.S. The last backup started at 1200. It's now 1354, and maybe 40% finished.
Sadly the maintainer appears to have lost interest, or suffered some awful fate, or something so unless someone else steps up to the plate then we are stuck with what we have. The only workaround that I could find was to change all the permissions on those files.
I cannot now find the thread where it appeared, but adding these options in .config/backintime/config seems to do the trick
profile1.snapshots.rsync_options.enabled=true profile1.snapshots.rsync_options.value=--no-perms --no-group --no-owner
the triple --no ignoring all the permissions.
By "seems" I mean: works for at least one randomly selected file for the last week.
Disk usage is down from ~ 90% to 49% as the older backups fall off. There are still a few backups that I'd like to keep, so it's going to stay at about 50%, but since it's a 5TB drive backing up a 2TB one it's not that big a deal.
Presumably a side effect of that is that it does not save the permissions, group or owner. That may or may not be a serious issue.
Executables are saved 755, other files 644. It doesn't change the owner/group for me, possibly because I'm saving it to a backup drive directory with the same ownership. It works for me, but YMMV.
Hi there,
I ran into the same problem. While just ignoring permission changes helps, this is not a real solution, as already pointed out. So I'd like to add some more information, that might be of interest to determine the underlying cause. (As a sidenote: First I thought this behaviour only occurs for a non-root user, but then I saw the exact same behaviour with root, so this does have no influence)
I created a simpe testcase in dir "test":
ls -l test
insgesamt 32
drwxrwxr-x 2 someuser someuser 4096 Okt 1 12:46 ./
drwxrwxr-x 7 someuser someuser 4096 Okt 1 10:32 ../
-rw-rw-r-- 1 someuser someuser 0 Okt 1 12:46 drittens
-r--r--r-- 1 someuser someuser 0 Sep 24 21:58 stop
-r--r--r-- 1 someuser someuser 0 Sep 24 22:05 trigg
Subsequently I never made any changes to test and the files contained within. Maybe backintime or rsync or sth else does, but not me personally.
first sync works as expected:
========== Take snapshot (profile 1): Fri Oct 1 13:00:54 2021 ==========
[C] <f+++++++++ home/someuser/tmp/test/drittens
[C] <f+++++++++ home/someuser/tmp/test/stop
[C] <f+++++++++ home/someuser/tmp/test/trigg
The next (second) snapshot doesn't see any changes - so of course no snapshot is taken (as specified in the profile's options). The third snapshot then is active again:
========== Take snapshot (profile 1): Fri Oct 1 13:01:19 2021 ==========
[C] cf...p..... home/someuser/tmp/test/stop
[C] cf...p..... home/someuser/tmp/test/trigg
... and so on: fourth snapshot: no changes found. fifth snapshot:
========== Take snapshot (profile 1): Fri Oct 1 13:01:36 2021 ==========
[C] cf...p..... home/someuser/tmp/test/stop
[C] cf...p..... home/someuser/tmp/test/trigg
Hopefully this can point into an insightful direction... If s.o. with a better understanding of the code could have a look I'd really appreciate it. Thanks and best regards!
PS and adding to the previous information - this at least explains why a diff in permissions if found every other time:
###########################
# on local machine
###########################
# snapshot realised
# stat trigg
Datei: trigg
Größe: 0 Blöcke: 16 EA Block: 4096 Normale leere Datei
Gerät: 3bh/59d Inode: 16255628 Verknüpfungen: 1
Zugriff: (0444/-r--r--r--) Uid: ( 1000/ someuser) Gid: ( 1000/ someuser)
Zugriff: 2021-10-01 12:45:51.933391548 +0200
Modifiziert: 2021-09-24 22:05:23.054726711 +0200
Geändert: 2021-10-01 10:32:11.626954901 +0200
Geburt: -
# snapshot NOT realised
# stat trigg
Datei: trigg
Größe: 0 Blöcke: 16 EA Block: 4096 Normale leere Datei
Gerät: 3bh/59d Inode: 16255628 Verknüpfungen: 1
Zugriff: (0444/-r--r--r--) Uid: ( 1000/ someuser) Gid: ( 1000/ someuser)
Zugriff: 2021-10-01 12:45:51.933391548 +0200
Modifiziert: 2021-09-24 22:05:23.054726711 +0200
Geändert: 2021-10-01 10:32:11.626954901 +0200
Geburt: -
###########################
# on remote machine
# stat is applied to the file in last_snapshot
# caveat: after another snapshot has been taken, you need to "cd" using the now changed link last_snapshot to actually be in the "new" last_snapshot dir
###########################
# snapshot realised
# stat trigg
File: ‘trigg’
Size: 0 Blocks: 24 IO Block: 4096 regular empty file
Device: 12h/18d Inode: 3822333 Links: 1
Access: (0444/-r--r--r--) Uid: ( 1026/ someuser) Gid: ( 100/ users)
Access: 2021-10-01 16:18:17.000000000 +0200
Modify: 2021-09-24 22:05:23.000000000 +0200
Change: 2021-10-01 16:18:17.878002314 +0200
Birth: -
# snapshot NOT realised
# stat trigg
File: ‘trigg’
Size: 0 Blocks: 24 IO Block: 4096 regular empty file
Device: 12h/18d Inode: 3822321 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1026/ someuser) Gid: ( 100/ users)
Access: 2021-10-01 16:13:04.599002007 +0200
Modify: 2021-09-24 22:05:23.000000000 +0200
Change: 2021-10-01 16:13:04.599002007 +0200
Birth: -
I have encountered the same issue: File that are originally r--r--r--
are somehow getting saved up as rw-r--r--
in the snapshot, therefore triggering a new (completely unneccessary) backup copy on every subsequent snapshot.
Not sure why this happens. #1086 or #1128 are probably workarounds, but patching backintime to restore previous behavior is not exactly an elegant solution.
Well, I just found that these read-only files are changed to -rw-r--r--
in all the existing snapshots right after I delete the oldest snapshot. That's how it will lead to read-only files being backed up again when I'll make a new snapshot.
Can anyone confirm this?
It turned out that it's enough to delete any of the snapshots with read-only files - not necessarily the oldest one of them.
I can't help thinking that this is connected to #988, but I'm not sure.
Is anyone able to reproduce the problem and can provide the details steps for reproduction please?
I observed that behaviour in the wild, too. But I'm not able to reproduce it in a test environment using files with permissions 444 or 666 everything went fine and behaved as expected.
In my latest VM testing, reproducing the problem seems rather straightforward:
I haven't further investigated what's happening here, but clearly there's some intermediate step in the snapshotting process (involving the temporary new_snapshot
dir probably?) that changes existing r--r--r-- files into rw-r--r--, leading to this behavior.
but clearly there's some intermediate step in the snapshotting process (involving the temporary
new_snapshot
dir probably?)
I can test this tomorrow night via debugging perhaps I can see something obvious causing this.
BTW: Did you use any special rsync or config options to make it reproducible?
BTW: Did you use any special rsync or config options to make it reproducible?
I think it's all very standard, with a minimal configuration running on the current dev code. I'm attaching the config I used: config.txt
- Include any r--r--r-- file in your backup.
Note that I included r--r--r-- files mixed in with rw-rw-r-- and other "standard" permissions, in case this makes a difference.
I could reproduce the r(w) permission ping-pong only when I delete a snapshot manually via the GUI (and most probably also automatically if configured so).
If I delete snapshots directly in the file system (without using the BiT GUI) the permission of the read-only file does not change so it must be caused by BiT.
During debugging I could identify this line of code in snapshots.py#remove()
that causes the r(w) permission ping-pong when executed:
The executed command in the rsync
variable is looking like this:
rsync -a --delete -s /tmp/tmpweklerru/ /home/<username>/temp/testBAK_profil1/backintime/<host name>/<username>/1/20230905-222640-779
rsync
constellation causes the r(w) ping pong?rsync
command to delete a snapshot? Is this perhaps only required for some backup targets (eg. ssh
)?I suspect that the -a
option of rsync
is the culprit:
--archive, -a This is equivalent to -rlptgoD. It is a quick way of saying you want recur‐ sion and want to preserve almost everything. Be aware that it does not in‐ clude preserving ACLs (-A), xattrs (-X), atimes (-U), crtimes (-N), nor the finding and preserving of hardlinks (-H).
The ACLs are not preserved:
Be aware that it does not include preserving ACLs (-A) I think could mean to use the default
umask
instead...
Since BiT is using rsync
s --link-dest
option to hard-link unchanged files between snapshots to save space
deleting a snapshot may have the side effect of changing the ACL of the linked file itself and therefore for all links to this file in other snapshots.
Perhaps adding --acls
and/or --perms --group --owner
may preserve the original permissions...
It looks like that the introduction of the --perms --group --owner
arguments in BiT v1.2.1 (see the FAQ entry) has introduced this bug which is consistent to above reports.
TODO:
rsync
forum?2. **TODO**: Why does BiT not just delete a snapshot folder but mount an empty tmp folder and executing above `rsync` command to delete a snapshot? Is this perhaps only required for some backup targets (eg. `ssh`)?
My 2 cents. I really don't know the rationale behind this, but if I could bet I would say that, by personal experience, deleting a snapshot by rsyncinc it with an empty folder is much faster than doing a rm -rf
, especially when there are a lot of (small?) files in the snapshot. So, one possibility is that this was done for performance reasons.
The ACLs are not preserved:
Aren't ACLs and the standard Unix permissions (like 444 = r--r--r--) two different things? :raised_eyebrow: Why would ACLs matter here? Or am I missing something basic?
Aren't ACLs and the standard Unix permissions (like 444 = r--r--r--) two different things?
Thanks for pointing out that I am riding the wrong horse :face_with_head_bandage: ;-)
The r(w) ping pong was reported with standard Linux rights (maybe also with ACLs if installed and used but I think nobody has reported or tested this so far).
The problem is that rsync
changes the change
time ("ctime") of hardlinked files in other snapshots if a hardlink is rsync --delete
d in another snapshot (with ext3
):
The ctime
is normally filesystem-specific and can normally not be influenced but I am wondering why the 'ctime' is not changed if I use rm
instead.
I think I can create a bash script that uses just rsync
to demonstrate this unwanted behavior (to ask the rsync
community for guidance).
BTW: There is a 5-year-old report about the same behavior leading to duplicated files in the backup (sorry, it's German only):
I have attached a bash-only script to make the issue 100 % reproducible:
Just download the file, rename it with mv setup.txt setup.sh
then chmod +x setup.sh
and run it with ./setup.sh
.
@all Could you please provide feed back if this script does reproduce the issue on your system too?
@buhtz Do you know where we could ask the rsync
community for guidance (using this script)?
Sounds like you really want to fix this before 1.4.0 ? :smile:
I did ask the "rsync community" at their mailing list: https://lists.samba.org/mailman/listinfo/rsync The maintainer is there but do not read everything. If it is urgent or "very important" it seems OK to contact Wayne Davison directly via mail. I did this in the past in context of the "rsync argument protection" problem.
Sounds like you really want to fix this before 1.4.0 ? 😄
I see no way to fix this for 1.4.0 (I will assign a new milestone).
I have tried all possible rsync
options related to permissions and time and nothing worked.
If rsync
is using another kernel API than rm
(I had to invest time to strace
this) it would require a fix on their side.
Currently the only work-around is to give up taken snapshots with --perms
.
BTW: This is a severe bug because it causes all files that are read-only for a user to be re-backuped again and again once BiT (or the user) deletes an old snapshot that contains a hardlink to the file in the latest (most-recent) snapshot and other issues report exactly this unwanted behavior (wasting storage memory).
Now that you've nailed down the root of the problem: Do you think that #988 might have the same root cause?
What does the ctime change have to with the change in permissions, though? :thinking:
In your example, I understand how rsync --delete
changes the ctime of the stat
ed file.
But I don't understand why its permissions jump from r--r--r--
to rw-r--r--
.
What does the ctime change have to with the change in permissions, though? 🤔
To me it is unclear if rsync
changes the permission to rw
so that the ctime is changed (eg. to be able to delete hardlinks) or vice versa.
rm
ing a hardlink does not change anything (I have to retest this; edit: re-tested -> rm -rf snapshot1
does update the Change
time but leaves the permissions at 444 -r--r--r--) on the other hand. So I suppose it is no kernel or fs module issue but caused by rsync
... Without debugging the rsync
code we can only speculate.
Do you think that #988 might have the same root cause?
Hard to say without
What is sure: This r/w ping pong does definitely cause unnecessary file copies in each new snapshot
This may explain some of the reports in #988 but others are perhaps only affected in the first snapshot with the new version (since it will do an almost complete new copy of all source files).
Furthermore if the target file system does not support Linux permissions the file permissions may be lost in every snapshot causing a full backup in every snapshot (to be tested!).
I have done some more scenario tests to find out if other constellations change the permissions of existing snapshot files too but it looks good.
If it should be a bug in rsync
we had to cope with a pre/post bug fix situation anyhow so I suggest to fix this (BiT) bug by
using rm -f
instead of rsync -a --delete -s
to delete a snapshot.
The proposed fix
rm
(eg. in a customized ssh
shell, see a similar problem in #1442)--perms
anymoreAdvantages:
--perms
every change of permissions causes a full file copy in the snapshot even though only the file permissions have changed; the permissions are also saved per snapshot by BiT in the fileinfo.bz2
in the snapshot root folder; the same holds true for the --group
and --owner
option IMHO, but this is untested)Disadvantages:
rsync
community to ask for an opinion using attached script that reproduces the problemDo not use
--perms
anymore
This would most probably also eliminate #988, which is another open mega-bug (that we don't understand as well as this one yet).
It's an attractive solution in my mind, but we have to weigh the consequences carefully.
This rsync
issue at least 6 years old and the bug report contains even a patch to fix this:
There is also a proposed workaround using --super
but this seems unreliable to me and may have side effects:
--super
This tells the receiving side to attempt super-user activities
even if the receiving rsync wasn't run by the super-user. These
activities include: preserving users via the --owner option,
preserving all groups (not just the current user's groups) via
the --group option, and copying devices via the --devices op‐
tion. This is useful for systems that allow such activities
without being the super-user, and also for ensuring that you
will get errors if the receiving side isn't being run as the su‐
per-user. To turn off super-user activities, the super-user can
use --no-super.
Then maybe there's a chance to press this problem with the rsync devs? It's causing us significant "downstream trouble", after all.
@buhtz knows more than any of us about rsync development, I think :)
Sorry, but I have to say that currently I don't understand the details of your discussion. I'm not really into the persmission-problems.
I have already written a question in the rsync mailing list:
https://lists.samba.org/archive/rsync/2023-September/subject.html
Depending on the feedback I will probably also bump the old open issue - I am registered user of Bugzilla for rsync
now (but I don't want to "cross-post" and flood different systems with the same questions until there is no response).
IMHO even a fix at the rsync
side does not guarantee that every distro and system is using the fixed rsync
version so BiT has to recognize this anyhow and decide how to create and delete snapshots to avoid this issue...
BTW: rm -f
seems to be significantly slower on some remote file systems (possibly Samba and NFS) and is not really a good option.
BTW:
rm -f
seems to be significantly slower on some remote file systems (possibly Samba and NFS) and is not really a good option.
Thanks for leading the discussion with the rsync devs :+1:
We need to keep in mind that BackInTime also has to deal with remote locations where we can't ssh in and call rm
. In those cases, rsync --delete
is the only way to remove a snapshot (unless I'm missing something basic).
We need to keep in mind that BackInTime also has to deal with remote locations where we can't ssh in and call
rm
Do you mean
a) "no ssh
and no rm
possible" or
b) "no rm
within ssh
possible"?
For a) it would require to configure a restricted shell for sshd
.
I can hardly imagine a real use case requiring write permissions without rm
Did you find such a setup in the wild?
I was trying but did not succeed so far. Even the secured Hetzner Storage Box provides such a basic thing like rm
(but not echo
;-).
I think we
rm -rf
rsync --delete
instead (if the user should really use a restricted ssh shell)Still I would prefer to keep the command dependencies on the remote-side small (= best to use only rsync
without rm
) but given the fact that fixed rsync
will not immediately available on each system we have no other option than using rm -rf
I think).
Did you find such a setup in the wild?
I was trying but did not succeed so far. Even the secured Hetzner Storage Box provides such a basic thing like
rm
(but notecho
;-).
No, you're right. I was confused. A "full rsync mode" was once in development, but it never became a supported reality in BiT. We can probably rely on rm
being available on a remote host, and I don't know of any particular setup where it would be missing.
I think we
- need to support
rm -rf
- auto-fallback or add a configuration option to use
rsync --delete
instead (if the user should really use a restricted ssh shell)- should somehow log how long it took to delete an (old) snapshot to give the user a way to recognize (too) long running snapshot deletions
I wonder if it's worth keeping the rsync --delete
option around. It's practically useless with --perms
. Which brings us back to #988, and a decision about what to do with permissions in general ;)
Using version 1.2 from stable ppa, on Ubuntu 19.04, backing up to local drive, sometimes files are being written to the snapshot with the wrong permissions. This makes itself known as after adding files and doing a snapshot if another snapshot is performed then sometimes files are modified showing permission differences. I have analysed one of these and see that for the orginal snapshot the permissions in fileinfo.bz2 do not match those in the written file, so I see
but
and I think 33060 maps to 100444 which is r--r--r-- which is what the source file is. After running the second backup fileinfo still has 33060 but the permissions in that backup are r--r--r-- as they should be. I can make the log for the first backup available if necessary but I would rather not do it publicly. There is nothing obviously odd for the entry for that file.