bit-team / backintime

Back In Time - An easy-to-use backup tool for GNU Linux using rsync in the back
https://backintime.readthedocs.io
GNU General Public License v2.0
2.01k stars 197 forks source link

Can't include specific files from excluded wildcard match (rsync: ordering of --exclude/--include) - Workaround exist #1420

Open insubstudios opened 1 year ago

insubstudios commented 1 year ago

i'm trying to exclude most of my dotfiles and include a few specifically but it's not working.

i added /home/xin/.* to my excludes and to my includes:

the folders are getting backed up but not the single files.

i checked the logs and the rsync command generated by BiT is putting the includes after the excludes. i copy/pasted it, moved the includes up, and ran it manually and it worked as expected.

Original rsync call from BIT logs (causing the problem)

from backintime log (line breaks added for clarity):

[I] rsync --recursive --times --devices --specials --hard-links --human-readable -s --links --perms --executability --group --owner --info=progress2 --no-inc-recursive --delete --delete-excluded -v -i --out-format=BACKINTIME: %i %n%L --link-dest=../../20230405-165753-308/backup --chmod=Du+wx
    --exclude=/media/xin/bigbox-6-A/xin
    --exclude=/home/xin/.local/share/backintime
    --exclude=.local/share/backintime/mnt

    --include=/home/xin/
    --include=/home/
    --include=/home/xin/.fonts/
    --include=/home/xin/.ssh/

    --exclude=.gvfs
    --exclude=.cache/*
    --exclude=.thumbnails*
    --exclude=.local/share/[Tt]rash*
    --exclude=*.backup*
    --exclude=*~
    --exclude=.dropbox*
    --exclude=/proc/*
    --exclude=/sys/*
    --exclude=/dev/*
    --exclude=/run/*
    --exclude=/etc/mtab
    --exclude=/var/cache/apt/archives/*.deb
    --exclude=lost+found/*
    --exclude=/tmp/*
    --exclude=/var/tmp/*
    --exclude=/var/backups/*
    --exclude=.Private
    --exclude=/home/xin/.*

    --include=/home/xin/**
    --include=/home/xin/.fonts/**
    --include=/home/xin/.ssh/**
    --include=/home/xin/.gitignore
    --include=/home/xin/.gitconfig
    --include=/home/xin/.bashrc

    --exclude=*
/ /media/xin/bigbox-6-A/xin/backintime/offish/xin/1/new_snapshot/backup

Expected rsync call modified order of include/exclude (solving the problem)

modified that works as expected:

rsync --recursive --times --devices --specials --hard-links --human-readable -s --links --perms --executability --group --owner --info=progress2 --no-inc-recursive --delete --delete-excluded -v -i --out-format=BACKINTIME: %i %n%L --link-dest=../../20230405-165753-308/backup --chmod=Du+wx
    --exclude=/media/xin/bigbox-6-A/xin
    --exclude=/home/xin/.local/share/backintime
    --exclude=.local/share/backintime/mnt

    --include=/home/xin/
    --include=/home/
    --include=/home/xin/.fonts/
    --include=/home/xin/.ssh/

    --include=/home/xin/.gitignore
    --include=/home/xin/.gitconfig
    --include=/home/xin/.bashrc

    --exclude=.gvfs
    --exclude=.cache/*
    --exclude=.thumbnails*
    --exclude=.local/share/[Tt]rash*
    --exclude=*.backup*
    --exclude=*~
    --exclude=.dropbox*
    --exclude=/proc/*
    --exclude=/sys/*
    --exclude=/dev/*
    --exclude=/run/*
    --exclude=/etc/mtab
    --exclude=/var/cache/apt/archives/*.deb
    --exclude=lost+found/*
    --exclude=/tmp/*
    --exclude=/var/tmp/*
    --exclude=/var/backups/*
    --exclude=.Private
    --exclude=/home/xin/.*

    --include=/home/xin/**
    --include=/home/xin/.fonts/**
    --include=/home/xin/.ssh/**

    --exclude=*
/ /media/xin/bigbox-6-A/xin/backintime/offish/xin/1/new_snapshot/backup
buhtz commented 1 year ago

Please have a look at #1419

There is no concrete solution for your problems but further resources to read. The pattern matching rules of rsync are not easy.

EDIT: Took some time to understand your problem. We have to take a closer look into the related rsync rules. I assume that there is a solution for your case without modifying BITs behavior in ordering include/excludes.

EDIT2: This pattern matching with rsync always gives me a headache.

insubstudios commented 1 year ago

Hi, thanks for the response :-)

i found this article helpful too: https://zingbretsen.com/blog/rsync-include-exclude that's how i figured out how to edit the command and have it work when running manually.

i went down a bit of a rabbit hole last night, ran a bunch of rsync commands, and i'm pretty sure i have it figured out. sorry this a bit of an info dump.

i believe it is a simple internal fix and is half the solution to #561

Expected Behavior:

rsync: include & exclude

The order of two or more include clauses doesn't seem to matter. Similarly for two or more exclude clauses. But the relative order of include and exclude is really important.

backintime

The rsyncSuffix() method builds the include/exclude part of the rsync command run for snapshots. This is grouped into 5 alternating chunks:

  1. exclude backup location and BiT config files
  2. include1: currently: folders and parent folders (should include files)
  3. exclude the exclude list
  4. include2: currently: files and folder children, /path/to/folder/** (should only be folder children)
  5. universal exclude, --exclude=*

Because BiT isn't adding the included files until after the exclude rules, they cannot be added. They need to be added to the first include chunk.

This can easily be done by changing line 2012 from:

                items2.add('--include={}'.format(folder))

to:

                items1.add('--include={}'.format(folder))
buhtz commented 1 year ago

Thanks a lot for analyzing and breaking this down. This will help a lot.

I'm not well into rsync behavior. And I'm always scared that modifying this will influence BITs default behavior and break someones other backups.

Because of that I'm very pedantic about a fix. I would like to dive deep into that rabit hole and understand all the details of how rsync behavior. That is how I can be "sure" about no one else is harmed by such a fix. It will take time I don't have currently. There are three other maintainers in the team but their time is also limited.

But I add the high priority label and the next-release-milestone to that Issue.

insubstudios commented 1 year ago

Thank you. Understandable. There's never enough time.

insubstudios commented 1 year ago

Workaround

i found a workaround for now:

add the include parameters into "additional options" under the "Expert Options" tab in settings.

this is inserted in the rsyncPrefix (defined in common/tools.py) so it is inserted before any other inlcude or exclude arguments.

make sure it doesn't match your backup location, ~/.local/share/backintime, or .local/share/backintime/mnt as that will probably cause problems. those are probably excluded first for a reason.

rsyncPrefix is used by snapshots.py for restore, backupConfig, and takeSnapshot and by sshtools.py for checkRemoteCommands. there doesn't seem to be any problems for restore and backupConfig. haven't tested the ssh stuff because i don't have that as part of my set up.

rsyncSuffix is only used by takeSnapshot.

DerekVeit commented 3 weeks ago

I have recently spent a little time on this subject of resolving the logic of the Include and Exclude configuration. I have what I think is a correct solution with a more easily followed logic, and I'm writing some description and explanation before I push it to a branch of my public repo for consideration. It solves this issue #1420 and #561 and meets all of the expectations identified by @insubstudios above.

I very much agree with the concern about quietly sabotaging someone's backup by subtly changing the logic of how their configuration is applied. As one thing to help with that, the branch (not yet pushed) includes unit tests where we can easily add more test cases of whatever combinations of includes, excludes, and subject files and try them out with both the current and proposed strategy.