Open smferris opened 1 year ago
There are so many ways to deal with exclusions and more keep being invented that I am considering a simple exclusion script system with some default scripts or functions. This way users could use the script system and check for xattrs, specific files, dynamically generating file lists or other things and share their scripts with each other.
Not sure if I will go down this path, but just something to think about - I would love to hear your opinion on it.
I'm neutral to the idea of scripting. The flexibility might be nice, but I have to wonder what the costs of it would be in terms of runtime performance, development effort, and maintenance effort. A fast backup is more important to me than maximum flexibility with excludes.
I suspect you can cover most people's exclude needs with a relatively small set of exclude capabilities, so I'm not yet convinced it would be worth the effort to make excludes scriptable, but maybe I'm just unaware of all the different approaches people want. Do you have a list of them collected somewhere? Off the top of my head, I'm only thinking of --exclude and --exclude-if-found, which you've already implemented, the --exclude-xattr I mention here, and something like --exclude-from (borg's name for it) to use patterns contained in a file rather than having to put them all on the command line.
borg-create has some interesting options to flip the problem around and specify which paths are desired, rather than which to exclude: --paths-from-command and --paths-from-stdin. Those let people generate the paths they want with another program and feed them in. That might cover some of the less common exclude needs better than adding a scripting language, since it doesn't limit people to one particular language. It does have the downside of requiring another program to do a filesystem traversal though, and the filesystem cache behavior might be unfortunate if that traversal and bupstash's are focusing on different areas of the filesystem.
It might be even better to have --include-from-{command,stdin} and --exclude-from-{command,stdin} rather than just --paths-from-{command,stdin}, so that you can generate both includes and excludes from another program.
I doubt I'd ever use -from-{command,stdin} functionality myself though, if you've got what I'd consider the core set of options: --exclude, --exclude-if-found, --exclude-xattr, and --exclude-from. Those 4 are all I'd really need in a backup program myself, as long as the patterns are reasonably expressive.
Another (possibly crazy) approach that avoids the duplicate traversal of the -from-{command,stdin} would be a --filter-command option, which specifies a filter program for bupstash to run with pipes for stdin and stdout. bupstash provides pathnames it has found (that aren't excluded by other options) to the filter's stdin, and the filter outputs only the pathnames that bupstash should actually use to stdout. That way there's just one filesystem traversal (bupstash's), and people can use any programming or scripting language to implement the filter. I think I like that idea even better than the -from-{command,stdin} borg has.
The context switching of using a separate program might hurt performance compared to doing filtering inside of bupstash itself though.
One example that many people want is git excludes where users often keep git extra exclude information in their ~/.git configuration as well as individual repositories. In that case the I wondered if a good approach would be to invoke git as the designated user in order to gather the exclude list and somehow communicate it back to the process walking the filesystem.
I have an aversion to special casing tools like git, even if they are common and would prefer a general mechanism can instead be added. The problem there seems to be balancing ease of use.
macOS uses an xattr
com.apple.metadata:com_apple_backup_excludeItem
to mark items to exclude from Time Machine backups. A variety of software packages use this to mark their temporary and cache files to get them automatically excluded from backups, both Apple software and 3rd party apps such as Chrome, Skype, etc.It would be nice to have a bupstash option to exclude a file or directory that has a specified xattr. Manually building an exclude list that covers all the files would be tedious and error prone.
I'll propose
--exclude-xattr
or--exclude-xattrname
(analogous to macOSfind -xattrname
).Then people using bupstash on macOS could run something like:
and exclude everything that would be excluded by a Time Machine backup.
The value of the xattr can vary on macOS (the value is a binary plist from what I've read), but as far as I know any xattr value will get the file excluded from Time Machine backups, so I think bupstash can just check for the existence of the xattr without caring about the value.