borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.15k stars 742 forks source link

Negative lookahead in RE exclude patterns? #5740

Closed jirib closed 1 week ago

jirib commented 3 years ago

Negative lookahead RE are not working? If not implemented could that be documented please?

What I was doing was to mimic excluding all and including only something (I know about patternfile but it's still mentioned as experimental).

borg 1.1.15
$ borg create --debug --list --dry-run --show-rc --exclude 're:^((?!home\/jiri\/bin).)*$' '::TEST-home-{now:%Y%m%d%H%M%SZ}' /home
using builtin fallback logging configuration
35 self tests completed in 0.25 seconds
Creating archive at "ssh://192.168.1.193:22/./test::TEST-home-20210319220910Z"
Processing files ...
x /home
terminating with success status, rc 0

$ ls -ld /home/jiri/bin
drwxrwxr-x 2 jiri jiri 4096 Mar 19 20:17 /home/jiri/bin

Regex tested https://regexr.com/5p05a

ThomasWaldmann commented 3 years ago

Your regex matches 'home' btw and everything matching is excluded, so it does not even recurse into home...

The mechanism used for --exclude is a bit older and more limited btw. - it can just exclude (without recursion).

You should try this with --pattern, which has better support for including, excluding (with recursion) and excluding (without recursion).

If you retry, show the result without --debug --dry-run, please.

In general, we use re from python stdlib, so everything implemented there should work.

jirib commented 3 years ago
$ touch ~/bin/{no_backup,yes_backup}

with pattern

$ borg create --debug --dry-run --list --show-rc --pattern='-re:^((?!/home/jiri/bin/yes_backup).)*$' '::TEST-home-{now:%Y%m%d%H%M%SZ}' /home/jiri/bin 2>&1
using builtin fallback logging configuration
35 self tests completed in 0.25 seconds
Creating archive at "ssh://jiri@192.168.1.193/./backup::TEST-home-20210324005713Z"
Processing files ...
x /home/jiri/bin
x /home/jiri/bin/no_backup
- /home/jiri/bin/yes_backup
terminating with success status, rc 0

with exclude

$ borg create --debug --dry-run --list --show-rc --exclude='re:^((?!/home/jiri/bin/yes_backup).)*$' '::TEST-home-{now:%Y%m%d%H%M%SZ}' /home/jiri/bin 2>&1
using builtin fallback logging configuration
35 self tests completed in 0.30 seconds
Creating archive at "ssh://jiri@192.168.1.193/./backup::TEST-home-20210324005730Z"
Processing files ...
x /home/jiri/bin
terminating with success status, rc 0
ThomasWaldmann commented 1 year ago

Guess the problem here was to try to match the leading slash (which is not there after normalisation). See the paths you get when doing borg list repo::archive - these are the paths you need to match.

hour-keeper commented 1 week ago

Guess the problem here was to try to match the leading slash (which is not there after normalisation). See the paths you get when doing borg list repo::archive - these are the paths you need to match.

@ThomasWaldmann In borg 1.4 version,negative lookahead assertion still doesn't work properly and I use ‘! re:^(?!etc).$' for testing, normally it should exclude all files except those in the etc directory, however running it results in excluding all files and directories except the root directory defined by the R flag, but using ’! re:^(?!) .$', the result is normal, and no files are matched. I have tested this expression using the python interpreter and the re module to make sure it is matching properly.

ThomasWaldmann commented 1 week ago

@hour-keeper thanks for the feedback, I'll try to reproduce and write a test case.

Please give the full commands you used, but try to use the MINIMAL commands to demonstrate the issue.

hour-keeper commented 1 week ago

@hour-keeper thanks for the feedback, I'll try to reproduce and write a test case.

Please give the full commands you used, but try to use the MINIMAL commands to demonstrate the issue.

I usually use borg via borgmatic, now I can't use my computer for the time being, used termux on my phone to provide a minimised command line and managed to reproduce the problem, full command line available tomorrow.

$ borg init -e repokey backup
$ borg create --debug --dry-run --list --show-rc --pattern='! re:^(?!etc).*$' backup::tesy $PREFIX
using builtin fallback logging configuration
33 self tests completed in 0.06 seconds
Creating archive at "backup::tesy"
Processing files ...
x /data/data/com.termux/files/usr
terminating with success status, rc 0

Using '! re:(?!etc).*$' gives the same result. borg version is 1.4.0.

hour-keeper commented 1 week ago

The following is the borg full commands used inside borgmatic:

BORG_PASSPHRASE=*** BORG_EXIT_CODES=*** borg create --patterns-from /tmp/tmp???????? --list --filter AMEx- --dry-run --debug --show-rc ssh://***.repo.borgbase.com/./repo::{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f}

The patterns file in the parameter exists for a very short period of time, and I have not yet found a way to capture the contents of this file into the clipboard

ThomasWaldmann commented 1 week ago

https://github.com/borgbackup/borg/issues/5740#issuecomment-2411946146 I don't think that demonstrates an issue.

Your pattern says:

The output shows a path not starting with etc and it is x (excluded):

x /data/data/com.termux/files/usr

So, works as designed?

Also unclear:

ThomasWaldmann commented 1 week ago

Trying to reproduce:

 % find . 
.
./etc
./etc/config
./home
./home/userfile
./tmp
./tmp/tmpfile
% borg init repo -e none
% borg create --list --pattern='! re:^(?!etc).*$' repo::archive . 
x .
% cd ..
% borg create --list --dry-run --pattern='! re:^(?!etc).*$' issue5760/repo::archive issue5740 
x issue5740

So, the problem is that any root directory will be excluded if it does not start with etc. As ! is used, the recursion is aborted.

% borg create --list --dry-run --pattern='- re:^(?!etc).*$' repo::archive .        
x .
- etc/config
- etc
x home
x home/userfile
x tmp
x tmp/tmpfile
x repo

Here, - is used, so borg continues to recurse and it works as expected (x == excluded, - == not backed up due to dry-run).

Without dry-run:

% borg create --list --pattern='- re:^(?!etc).*$' repo::archive2 .
x .
A etc/config
d etc
x home
x home/userfile
x tmp
x tmp/tmpfile
x repo
hour-keeper commented 1 week ago

#5740 (comment) I don't think that demonstrates an issue.

Your pattern says:

* `!` == "exclude if ..."

* `^(?!etc).*$` == "... the path does not start with `etc`"

The output shows a path not starting with etc and it is x (excluded):

x /data/data/com.termux/files/usr

So, works as designed?

Also unclear:

* what is `$PREFIX`?

* what files exist below `$PREFIX`?

I apologise for not stating that $PREFIX has the value /data/data/com.termux/files/usr in termux, and as the name suggests, there are a set of files underneath that a normal linux should have, including etc.

Trying to reproduce:

 % find . 
.
./etc
./etc/config
./home
./home/userfile
./tmp
./tmp/tmpfile
% borg init repo -e none
% borg create --list --pattern='! re:^(?!etc).*$' repo::archive . 
x .
% cd ..
% borg create --list --dry-run --pattern='! re:^(?!etc).*$' issue5760/repo::archive issue5740 
x issue5740

So, the problem is that any root directory will be excluded if it does not start with etc. As ! is used, the recursion is aborted.

% borg create --list --dry-run --pattern='- re:^(?!etc).*$' repo::archive .        
x .
- etc/config
- etc
x home
x home/userfile
x tmp
x tmp/tmpfile
x repo

Here, - is used, so borg continues to recurse and it works as expected (x == excluded, - == not backed up due to dry-run).

Without dry-run:

% borg create --list --pattern='- re:^(?!etc).*$' repo::archive2 .
x .
A etc/config
d etc
x home
x home/userfile
x tmp
x tmp/tmpfile
x repo

That's exactly what I was aiming for, recursion causes performance issues under windows wsl docker, so wanted to terminate recursion outside of the target folder in the root directory. I misunderstood the actual mechanism of how borg works, perhaps we should state in the documentation that the root directory is included in the match along with it so that we can exclude the root directory from regular matching. Hopefully we can easily match the root directory using a special flag instead of typing the full root path.

hour-keeper commented 1 week ago

I tried to use ‘! re:^/$' on my linux computer to exclude the root directory in order to simply reverse simulate the use case where the negative lookahead assertion is used without excluding the root directory, and it turns out that the root path is not excluded. Is there a special name for the root path that is used inside the program to match? I didn't find such a name in the documentation. My pattern file:

R /
! re:^/$
hour-keeper commented 1 week ago

I tried to use ‘! re:^/$' on my linux computer to exclude the root directory in order to simply reverse simulate the use case where the negative lookahead assertion is used without excluding the root directory, and it turns out that the root path is not excluded. Is there a special name for the root path that is used inside the program to match? I didn't find such a name in the documentation. My pattern file:

R /
! re:^/$

I'm sorry I missed the documentation, the root directory automatically drops the / at the beginning when normalizing. If the root directory is /, what will it look like when normalizing and how to match it?

hour-keeper commented 1 week ago

I have now circumvented this problem by setting the root directory to '/' . It makes the root directory matchable via '^$'. Thank you for your patience in replying!@ThomasWaldmann