borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.01k stars 738 forks source link

[Feature Request] Add `--base-archive` to command `create`. #5416

Open srkunze opened 3 years ago

srkunze commented 3 years ago

What?

Add --base-archive to command create.

Why?

Allow incremental updates of existing repositories. Use case: #5413

Extracting a complete borg archive might sometimes be impossible due to space restrictions (and can take forever). Additionally, updating an existing borg archive by partitioning the repository manually (different archives for different source folders) can be a problem as well. These manually crafted partitions can also grow to unmanageable size and in the end, one needs a management of these as well.

A flexible solution (at least to me) is preferred.

How?

Partial extraction is already possible (borg mount, borg extract). Partial archive creation isn't.

The idea is to provide a base archive in the repository onto which to create a new version:

# archive all the pictures, documents and movies
borg create --list /path/to/repo::myfirst ~/pictures/ ~/documents ~/movies
# after a year, backup 2020 pictures
borg create --base-archive myfirst --list /path/to/repo::mysecond ~/pictures/2020-*

Details

Idea in python-pseudo code:

for path in walk(arg.path):
    process(path)
for item in base_archive.iter_items():
    if itempath no already processed:
        new_archive.add_item(item)

Impact

None.

ThomasWaldmann commented 3 years ago

Please check terminology: repository vs. archive vs. source files.

Also rather describe what we have and what you want without referring to how git works, that only makes it more complicated.

Partial create is of course already possible (you can tell it what you want to put into an archive), but it is not what you want.

Missing still:

srkunze commented 3 years ago

Done.

ThomasWaldmann commented 3 years ago

Looks like a differential backup (not a full backup like borg normally does).

As I said, leads to prune and manual handling issues.

Also, borg extract expects an empty target dir to start from, it does not (in all cases) support merging into an existing directory/file structure.

If stuff gets renamed (or deleted), there is additional complexity.

srkunze commented 3 years ago

differential backup

Yeah, I think so. But still, the "checked out" version is not completely available. If I understand "differential" correctly, then it means that all changes are accumulated since the last "full backup". However, both terms do not really specify what the snapshot really is taken from: a completely restored archive OR an only-partially restored one.

prune and manual handling

"Manual" sucks but even more so if you need to handle it really manually. If borg does it for us, I would not call it "manual" but "standardized". ;-)

borg extract expects an empty target dir to start from

Could you explain what exactly mean by that? I would have expected some sort of a database-like entry that just needs to be duplicated for the new archive and that's it for a file to be "copied over" to the new archive.

If stuff gets renamed (or deleted), there is additional complexity.

Maybe the internals are not 100% clear to me. Can you point that complexity out in the source code for me?

enkore commented 3 years ago
for path in walk(arg.path):
    process(path)
for item in base_archive.iter_items():
    if itempath no already processed:
        new_archive.add_item(item)

Archives are read front-to-back and are basically structured like a tar archive. If you swap those two loops, you get roughly what you want, but keep in mind that this will give you the union of all paths, so /foo/bar/baz in the base archive will be present in the created archive, even if /foo/bar is excluded or deleted.

trtracer commented 3 years ago

This would also be great for backing up VM-Disks! I have several VMs that have quite large disks but the changes over time on there Disks is not so big. My current solution is to always make a full backup and 99% gets deduplicated. But this full-backup takes really long.

With something like this, I could make a full backup with borg, later make a diff-image of the VM and create a incremental backup with borg. The incremental Backup would be very fast. Restore still requires some manual work.

Or is there a better solution ATM?

RonnyPfannschmidt commented 3 years ago

It's already possible to put vm diffs into separate archives, it's just not sensible to integrate the details as of now

Also the initial intent of this issue is more like the goal of git-annex and less like backup

srkunze commented 3 years ago

@trtracer At least with the management of the backup archives, I wrote a helper for borg using OverlayFS https://github.com/srkunze/borg-stack

The dedup should still take a considerable amount of time to be performed as the changes need to be located.