Open yrro opened 8 years ago
It might be good to think about creating some plug points in the file system scanner code, such that users can roll their own "exclude this, too" code if the base exclusions aren't enough.
Either that, or implement reading the list of paths to process from a file, and letting users implement whatever inclusion/exclusion strategy they want out entirely of process.
The downsides of the latter are the increased chance of stale data for users who don't use snapshots, and blowing the FS cache (reading some metadata to generate the file list, but then not reading the rest of the metadata and data in one pass).
--files-from is #841
Is there currently a way to specify paths in an --exclude-from
-file to be relative to that files path?
@guedressel don't think so.
Would this be difficult to add? Any pointers on how to get started?
I am also in a situation where I want to back up the home folders of all the users on our storage cluster, and want the users to be in control of what files get backed up.
@ostrokach maybe have a look at the "--pattern" changes in master branch. check if it already does what you need. if not, it maybe could be added. this stuff is new, so it is easier to change than already released functionality.
As .gitignore
issue was merged here I suppose I could comment here. Generally, you can achieve parsing .gitignore
parsing with the following:
P sh
and R dirname
, where dirname is directory where .gitignore
was found in..gitignore
, for each line:
#
!
replace it with +
, otherwise prepend line with -
Now include that pattern set into the runtime config. You should be fine.
I've made something similar in bash script that runs before borg and generates exclude list. But I use --exclude-from
, not patterns-from
, as borgmatic that I use to configure my backups does not support it.
There really should be a way to specify the filtering via .gitignore ...
How about this: we add two options, --exclusion-marker-file
and --exclusion-generator-script
. Say I set --exclusion-marker-file=.gitignore
and --exclusion-generator-script="bash gitignore.sh"
. Now every time a folder containing a .gitignore
is found, borg will call bash gitignore.sh /path/to/that/.gitignore
. The called script will print an exclusion file for that folder into stdout.
I've made something similar in bash script that runs before borg and generates exclude list. But I use --exclude-from, not patterns-from, as borgmatic that I use to configure my backups does not support it.
Note that borgmatic does support specifying patterns_from
now.
By the way: I use pathspec for a small script of mine. Works fine! https://pypi.org/project/pathspec/
Regarding my suggestion above: It would change how tagged files work. Instead of saying "hey, I'm tagged, exclude me", dir_is_tagged
should somehow generate a list of patterns to be included. This should be done by calling a script (or maybe a python function that may call a script), to allow for maximum configuration. Either this list of patterns is added to the PatternMatcher
, or we make the pattern matching stack-based. The latter could improve performance, but would require more work to implement it.
This change is backward-compatible because the old marker files become marker files that ignore everything in that directory. (It would even make the option to keep the marker files themselves obsolete). But this feature could be implemented completely independent of the other as well.
I'ld rather not call a script. borg often runs as root and calling external scripts can be a security issue.
Maybe "calling a script" is a bad way to phrase it. What I mean with it is to have the possibility to call an external command that does the job like with BORG_PASSCOMMAND
. The command takes in a path as argument or via stdin and generates a list of exclusions, either to stdout (preferred) or to an external file.
Maybe I misunderstood your suggestion. Calling one specific, admin-configured script is not a problem usually (as the admin is responsible for having safe permissions on that), but if we would discover such scripts on the fs like we do with the exclude tags, that might easily become a security issue.
Say I set --exclusion-marker-file=.gitignore
and --exclusion-marker-command=my-gitignore-to-excludes
. Now if Borg encounters any file called .gitignore
, it will call my-gitignore-to-excludes /path/to/gitignore/that/was/found/.gitignore
, which in return may print something like
/path/to/gitignore/that/was/found/bin/
/path/to/gitignore/that/was/found/build/
/path/to/gitignore/that/was/found/*.class
to standard out which then will be ignored.
(Note that everything here is just an idea and I'm absolutely open on the details of the implementation)
I am going to abandon this feature for now. Since I do not have the brain power to process borg's core backup code yet to add a new feature, I will hack together a solution using a preprocessor that generates a custom exclusion file by walking the file tree before calling borg.
If someone wants to implement this feature, I will be happy to help as much as I can.
If anyone's interested: I've written a small script to exclude gitignored files:
#!/bin/bash
# Arguments: a path to check for
# Output: all ignored files and folders in all git repositories in the input folder as borg ignore pattern
# Iterate through all directories that contain a .git folder.
# Warning: This will result into invalid patterns if the folder is not a valid git repository (grep fatal to find them out)
for p in $(find $1 -name ".git" | xargs dirname)
do
# Keep the last folder in mind to skip redundant subfolder exclusions
LASTFOLDER="$p/.foldernamethatwonteverexist/"
# Loop list all files of the current repository and ask git if they are ignored
tree -f -i -x --noreport $p | git -C $p check-ignore --stdin | while read -r q
do
# Skip folders that are subfolders of the last skipped folder; print the final result to stdout
if [[ $q == $LASTFOLDER* ]]; then
continue
elif [[ -d $q ]]; then
LASTFOLDER=$q
echo "pp:$q/"
else
echo "pf:$q"
fi
done
done
Given a path as argument, it will recursively search for git projects in it. It will then list all files in those git projects and filter them if they are not gitignored. The remaining paths are processed to a borg exclude file written to stdout.
On my Documents folder, it takes only three seconds to run, which is acceptable for me.
Why would i want to back up my dependencies in my git repos? This is pretty gross that it doesnt work. Does nobody from borg use javascript / have a node_modules folder?
Sorry to bump this issue, but I came here by searching for something like borg ignore directories by name and I do have to handle lots of same (unique) name directories (like the mentioned _nodemodules) that I want to exclude from my backup. I just tried a little and it wasn't to complicated to exclude all of them recursively... For testing purposes I created a small directory structure as follows:
tree backup_test/
backup_test/
├── node_modules
│ └── test_in_node_modules.txt
├── subdir
│ ├── node_modules
│ │ └── test_in_node_modules.txt
│ ├── subsubdir
│ │ ├── node_modules
│ │ │ └── test_in_node_modules.txt
│ │ └── subsub_test.txt
│ └── sub_test.txt
└── test.txt
By using the create command with the following exclude option I was able to exclude all node_modules directories:
borg create --exclude 'sh:**/node_modules' borgrepo::1 backup_test
Backup lists as follows:
borg list borgrepo::1
drwxr-xr-x user users 0 Sun, 2019-10-20 22:27:24 backup_test
drwxr-xr-x user users 0 Sun, 2019-10-20 22:27:32 backup_test/subdir
drwxr-xr-x user users 0 Sun, 2019-10-20 22:27:46 backup_test/subdir/subsubdir
-rw-r--r-- user users 0 Sun, 2019-10-20 22:27:46 backup_test/subdir/subsubdir/subsub_test.txt
-rw-r--r-- user users 0 Sun, 2019-10-20 22:27:32 backup_test/subdir/sub_test.txt
-rw-r--r-- user users 0 Sun, 2019-10-20 22:27:24 backup_test/test.txt
I'm not sure if I'm missing something, but I think for my primitive use case that seems to be enough. Best regards and thanks to everyone involved in borgs development, it's such an awesome tool and I'm loving it so far! :heart:
borg 1.1.10
I came up with a few lines of bash in my backup script that does just that, if anyone is interested :
find /home/user/Workspace -type f -name ".gitignore" -printf "%h\n" | \
xargs -I '{}' bash -c "egrep -v '^(\s*|#.*)$' \"{}/.gitignore\" | awk '{print \"{}/\" \$0}' " \
> /tmp/exclude-backup
borg create [...] \
/home/user \
--exclude-from /tmp/exclude-backup \
This creates a file at /tmp/exclude-backup
with the list of all the concatenated .gitignore
s content, and use it as an exclude list for borg.
@biocrypto730: It isn't safe to exclude all folders named node_modules
, nor to exclude everything matched by .gitignore
, because some of them need to be backed up anyway.
Anything you install using npm install --global
is placed in a global node_modules
folder, by default /usr/local/lib/node_modules
(POSIX) or %AppData%\npm\lib\node_modules
(Windows).
If you install a Node application from a single archive file, the archive will probably contain a node_modules
folder pre-populated with all of the app's dependencies. (Electron apps usually bundle all of that into a .asar
file instead, but that only exists in Electron, not vanilla Node.)
If you use Visual Studio Code, Code itself does not have a node_modules
folder, but each extension you install does have one. Some extensions also, stupidly, contain a .gitignore
file, including Red Hat's XML extension.
If you've “installed” a package by checking out its source tree and running it directly from there (as opposed to installing it with npm install --global
, make install
, or the like), then it will contain build artifacts (node_modules
for Node packages, executables for C/C++ packages, and so on) that need to be preserved. This is not common with Node packages (everyone uses npm or yarn nowadays), but C/C++ packages are sometimes used this way.
If Borg has to be specifically told to honor version control ignore files, and the documentation specifically warns not to use that option if you semi-install things as described in item 4, then it's safe to do that. But it's not safe as a default behavior.
It should always be safe to read exclude paths from files within the backup source, if the file is named something like .backupignore
or .borgignore
. But just because something is a build artifact and/or excluded from version control doesn't mean it should be excluded from backup.
I current back up my home machine with
rsync
and make use of its-F
option.Combined with the file
~/.rsync-filter
, with the following contents:This has several advantages over Borg's existing features for specifying exclusion paths:
CACHEDIR.TAG
files throughout my home directory, becausewhen I clear out a cache directory by removing it, I don't have to remember to recreate it and itsCACHEDIR.TAG
file./mnt/bsnap/home
, rather than/home
directly.