Closed witten closed 1 day ago
Sorry, this is not about the subject of paths, but just a thought on the scenario of @witten: what bothers me with the idea of creating archives from zfs
or btrfs
snapshots is, that - IIRC - borg
uses absolute paths for caching (which files will not be read until BORG_FILES_CACHE_TTL is hit). So from my understanding, if you use timestamp based snapshot directories, in your scenario, @witten, you might also experience slow archive creation due to no files recognized by cache and always re-read from disk. Enabling such path replacements might lead to this user experience (if borg
indeed handles as I remember.) (If that snapshot was always mounted or symlinked(?) to the same directory, this would not occure, i guess)
Thanks for pointing this out. I can't speak to btrfs
, but with zfs
borgmatic could use a consistent snapshot name without encoding a timestamp in it. In the example above, I wasn't even thinking about using timestamps, but more like PID or something to prevent collisions. But in theory borgmatic or a non-borgmatic user could avoid even that when making filesystem snapshots if it interfered with Borg's caching.
Another idea would be: to me it feels as if borg create
on snapshot directories may become a more common thing, and "spoofing the same directory" for files cache to work may become weird, the problem could be considered differently by a flag changing borgs behaviour:
borg create --treat-sourcepath-as="/my/zfs/mountpoint/" /my/zfs/mountpoint/.zfs/snapshots/borgmatic-1234
That flag would:
--treat-sourcepath-as
borg create
as if they were that specified directory.borg create
on the "live" directory using files cache.You might solve that way a (very special) path rewrite problem and enable borg to more efficiently handle snapshots by btrfs
,zfs
,rsync
etc, like I guess https://github.com/borgbackup/borg-import appears to address (only read about it)
Alternative to the path rewrite problem:
Let borg recreate
also rewrite path elements (strip, add, replace).
borg create
which might cause errors due to ambiguousityI haven't followed everything in this issue, but borg can use relative paths, so you can do
cd /long/path/you/want/to/hide
borg create repo::archive .
and it will backup /long/path/you/want/to/hide
but store paths relative to the that directory. E.g. the file /long/path/you/want/to/hide/subdir/example
will be stored as subdir/example
. So this is a way to strip leading path names.
I'm not in favour of the double-dot-slash hack. I'd prefer a general --rewrite-path
argument, but if it gets fancy, it might be complicated to document and use.
The problem with either a command-line flag like --treat-sourcepath-as
/--rewrite-path
or the use of relative paths along with an initial cd
is that it only works if you've got a single path to backup. As soon as you want to ask Borg to backup multiple paths with potentially differing prefixes (or the same prefix that you want rewritten differently for each path), both of these approaches break down. That's the benefit, IMO, of encoding the prefix that's to be stripped in the path itself; you can vary the prefix for each path as necessary.
Having said that, there may be other / better ways of encoding that information. (Maybe there's an approach with repeated uses of the same command-line flag?) But I do think at least my use case requires the ability for different paths to get different transformations, because a lot more can go into a backup than a single filesystem snapshot. And I imagine that applies to other use cases as well.
The borg recreate
idea is interesting, but that might just be pushing the same design considerations downstream. You still need some way to specify which paths get which transformations IMO.
I was imagining allowing multiple uses of --rewrite-path
, so I think that covers your objection. Moreover, instead of only allowing deletion, it allows replacement with a different path, which I expect would be a common case.
But more importantly, encoding transformations within the path itself seems too fragile. For example, /./
can occur in paths generated by other programs.
What would the syntax of --rewrite-path
look like? I'm wondering how you'd specify either individual paths or multiple paths to transform. And also how that transformation would be given.
But more importantly, encoding transformations within the path itself seems too fragile. For example, /./ can occur in paths generated by other programs.
That ship has already flown, given that the single /./
hack is already supported in Borg 1.4 and 2.0. But I could maybe see an argument against doubling down on that approach if you didn't like it to begin with. :smile:
What would the syntax of --rewrite-path look like?
--rewrite-path replace/this with/that
--rewrite-path get/rid/of/this/ ''
Give it multiple times to do multiple replacements. It would be a purely string-based replacement process.
Cool, I think that would work for my use case and I imagine other use cases as well. You'd just have to be careful that you're not matching more source paths than you intend to match. To that end, I can imagine needing the ability to match against the start of a path vs. the middle of a path vs. the end of a path... which would suggest regular expression support (e.g. ^
and $
)... which would complicate things. Maybe just general Borg patterns support would be the way to go here. (Incidentally, I feel like this general approach was discussed on another Borg ticket a while back, but I can't find it right now.)
@ThomasWaldmann, thoughts on the various approaches proposed? --rewrite-path
vs. "SM"?
Don't you have the path the ZFS snapshot is mounted to under control? I've never used ZFS, so no idea, but btrfs and LVM snapshots can be mounted to arbitrary paths. If not, you could still use mount --bind
to effectively mount the mounted snapshot to some other path.
So, to archive a snapshotted /my/zfs/mountpoint/some/file
as my/zfs/mountpoint/some/file
(no absolute paths in borg), you mount the snapshot of /my/zfs/mountpoint
at /some/arbitrary/path/my/zfs/mountpoint
and cd
into /some/arbitrary/path
before running borg create repo::archive .
, or use borg's slashdot hack. /some/arbitrary/path
could be /my/zfs/mountpoint/.zfs/snapshots/borgmatic-1234
, even though I'd rather recommend some path below /run
, e.g. /run/borgmatic/mounts-1234
.
mkdir -p /run/borgmatic/mounts-1234/my/zfs/mountpoint
mount <zfs_snapshot of /my/zfs/mountpoint> /run/borgmatic/mounts-1234/my/zfs/mountpoint
cd /run/borgmatic/mounts-1234
borg create repo::archive .
Personally I don't even like borg's slashdot hack: I consider it "magic" that leads to potentially unexpected behaviour and basically asks for problems. A ./
easily slips into a path, especially when using one of the many borg wrapper tools. Some option like --rewrite-path
is much cleaner, but I still don't see a great benefit: One should rather create the directory structure borg shall backup (preferably with mounted snapshots, or mount --bind
) and just cd
there before running borg create repo::archive .
. Adding complex options like --rewrite-path
or "magic" like the slashdot hack just complicates things; many users are already overwhelmed by --pattern
...
Thanks for weighing in. My understanding with ZFS is that the snapshot gets auto-mounted at, say, /my/zfs/mountpoint/.zfs/snapshots/borgmatic-1234
. But it's possible I can implement what you suggest by first unmounting that auto-mounted snapshot and then re-mounting it at /somewhere/my/zfs/mountpoint
, passing /somewhere/./my/zfs/mountpoint
to Borg to employ the slashdot hack.
The cd
trick won't work here because there can be other non-snapshot, relative directories passed to Borg at the same time. And I've also looked at bind mounts, which are certainly possible but more complex, especially given the variance across operating systems.
Anyway, I'll give this a shot!
This general approach appears to work. Turns out I didn't need to unmount the auto-mounted snapshot; I can just mount the snapshot in a second location. I'll close this ticket for now, but of course feel free to reopen it if someone else has a need for any of the other features discussed here.
Thanks all!
Coming late to the party, was busy with other borg stuff.
Just wanted to add that the files cache key was changed:
H(full absolute path)
H(archived path)
The change was done so borg is able to rebuild the files cache also from an archive in the repo.
Also, the files cache filename suffix is now derived from the archive name (except if the user sets BORG_FILES_CACHE_SUFFIX
to control it), which works nicely together with the archive series way of doing things. That also made it possible to lower the default TTL to 2.
My use case: I'm looking at implementing ZFS filesystem integration as part of borgmatic so that ZFS users can take advantage of consistent snapshots when storing their files into a Borg archive. (I believe that non-borgmatic Borg/ZFS users also have similar needs.)
That integration would work something like the following when creating a backup archive:
/my/zfs/mountpoint
's snapshot might show up at/my/zfs/mountpoint/.zfs/snapshots/borgmatic-1234
.create
with Borg, passing in the snapshot paths.The problem with this plan is that the snapshot directory ends up in the Borg archive, e.g.
/my/zfs/mountpoint/.zfs/snapshots/borgmatic-1234
. However, ideally the snapshotting is completely transparent to the user, and the files instead get stored in the Borg archive to appear as if they came from their original ZFS filesystem mountpoints, e.g. just/my/zfs/mountpoint
.One way to support this ask would be to implement a feature that was already discussed (but not implemented) here: https://github.com/borgbackup/borg/issues/4685#issuecomment-1927895793 ... specifically the "SM" option. The idea is that borgmatic or the user would pass the following to Borg create:
/my/zfs/mountpoint/./.zfs/snapshots/borgmatic-1234/./
.In this case, everything between the
/./
s would get stripped, leaving just/my/zfs/mountpoint/
as the path stored in the Borg archive for these snapshotted files. (In the general case, there could also be path components after the second/./
that would be kept, but in this particular example that's not needed.)The end result for this use case is that snapshot files appear at their "original" locations and can easily be extracted back to them should that be necessary.
In terms of ways this could go wrong: The interaction with the existing single slashdot hack should be considered. I think the way that could work is for Borg to search for two
/./
s in a path. If found, do the new "SM" behavior. But if there's only a single/./
, run the existing "SP" logic.Additionally, if a user unintentionally includes
/./
twice in their path not intending to use this feature, then they could inadvertently get part of their path stripped out. That same downside applies to the existing "SP" logic.Why not the "RP" logic? It just seems more complicated, and it doesn't have the nice property of normalizing to a standard (existent) path like the other approaches do.