enzingerm / snapborg

Synchronize snapper snapshots to a borg repository
GNU General Public License v3.0
35 stars 6 forks source link

Easy installation #5

Closed jrabinow closed 3 years ago

jrabinow commented 3 years ago

This PR adds a setup.py and a PKGBUILD for easy installation on Linux systems.

I would like to submit the PKGBUILD to the ArchLinux AUR, but wanted to make sure you were OK with being listed as maintainer. If you'd prefer to avoid that, I'm happy to point the PKGBUILD at my fork. The PKGBUILD technically shouldn't be part of this repo, I just want:

enzingerm commented 3 years ago

Hey, thanks for your contribution. In general I like the idea of having this available in the AUR and I have nothing against being listed as a maintainer. But beforehand I think the use-case for this project has to be pointed out more clearly in the docs. Snapper has its own mechanism for snapshot retention, so why does snapborg calculate the snapshots which should be retained on its own? In my specific use case I like to keep as many snapshots as possible on the server while only a coarse subset of those snapshots should be transferred to the backup target. If this isn't needed, this could be a simple wrapper which just synchronizes each existing snapper snapshot to the backup target. May I ask you to describe the scenario in which you are using snapborg?

Nevertheless I'll have a closer look at the changes tomorrow.

jrabinow commented 3 years ago

My usecase is for my personal laptop, where I want an option to recover from a failing hard drive. The backup target (an external hard drive) is plugged into the laptop via USB a couple times a week.

I initially got thrown off myself by the syncing of only some snapshots instead of all, like you're describing. But then I realized that having backups of the whole set of hourly snapshots is kind of pointless for my usecase. Ideally, I'm looking for something like this:

I haven't modified the script to do this stuff yet, and I wanted to try and keep the PRs focused. Any thoughts?

enzingerm commented 3 years ago
  • latest backup, gets deleted and recreated each time snapborg is run

This could be done by just setting the retention settings (keep_last = 1) correctly. However, the previous backup would not automatically get deleted if you don't call prune after each backup.

  • weekly snapshots timeline

See above

  • possibility to add one-off "named" backups, similar to what snapper has with its "single" type, where the snapshot manually created and is left alone until manually deleted.

You mean transferring the current filesystem state via borg to your external drive without involving snapper?

  • no pruning until the backup target is full, at which point the earliest timeline backup disappears. Given how file deduplication works, this is a hard one to code, so I intend to work around it by buying another hard drive or deleting backups manually.

The deduplicated size of a borg backup can be determined but I agree with this being quite complicated to code. For now you could just set a very high number for keep_weekly and look at the free space at the disk from time to time.

jrabinow commented 3 years ago

You mean transferring the current filesystem state via borg to your external drive without involving snapper?

Not quite but almost. I meant if a snapshot was taken by manually invoking snapper (from the command-line, instead of from pacman hooks or a systemd timer), it's a snapshot of type "single" and without any associated cleanup policy. I'm thinking that in this case, it would make sense to have snapborg transfer that snapshot over to the borg repo, but ignore it when pruning is run, even if the snapshot is deleted on the base system. This might be doable by bypassing snapborg and using borg directly as you suggest, but I'm not sure if the snapborg pruning algorithm handles that case correctly - I also think that bypassing snapborg is a workaround and that ideally from a UX perspective, snapborg should be able to handle that case correctly. Then again, it may not be worth the extra engineering. I'll have to look into it and test things out (as soon as I have some bandwidth, which I'm afraid is rather limited).

Re everything else: 👍 I'll push an update by next weekend. Thanks for taking the time to review and understand my usecase.

enzingerm commented 3 years ago

Actually, snapshots created by the snapper-timeline service are also of type single but they are subject to the configured cleanup policy (usually timeline). As soon as the snapshot has been transferred to the borg repo, what happens on the snapper side (whether or not the snapshot gets cleaned up at some point) is completely irrelevant for the borg backup. I don't know any method to keep one specific borg backup when running prune so either you set the retention settings accordingly or don't run prune at all.

It should be possible to use 2 borg repositories (residing in different directories on the same HDD). The first one for automated snapshots, where prune is run regularly, and the second one for all the manual snapshots which should be kept forever. But I think you would lose deduplication if you do it that way.

Edit: It's actually not possible at the moment due to the reason described in #6 but that could be solved

jrabinow commented 3 years ago

I think that #6 is an issue for the reasons you mention, but I don't think that it's necessarily an issue for the workaround you suggested above. For my usecase, it would be enough to back up each snapshot to a single repository: as long as a snapshot doesn't need to be backed up twice, I think that #6 doesn't need to be of concern here.

Actually I take that back, I was planning on eventually backing up snapshots to 2 separate hard drives, so I was going to run into this issue at some point. Anyways, we're not there just yet.

The only problem with the workaround is having to now manually specify what snapshots to backup to the repo. Might as well use borg directly, in which case the metadata and therefore #6 is even less relevant since snapborg wouldn't even be part of the picture.

There's still the lack of deduplication: do we really need 2 borg repos, can we do better? In that case, we have a retention problem, we don't want prune to delete stuff it shouldn't. From what I can see, sadly borg doesn't allow setting metadata on backups, but it does allow for only applying prune to specific name patterns (using the --prefix PREFIX or --glob-archives GLOB flag), as well as for renaming backups if need be. This would allow to run prune without modifying snapborg retention settings, at the cost of defining and implementing a naming convention internal to snapborg. I think this would be useful in any case as a mitigation strategy for user error when setting retention limits, or if the prune algorithm goes rogue one day and deletes a whole bunch of stuff it shouldn't have. I can implement that part in a separate PR if you're ok with that strategy.

jrabinow commented 3 years ago

Hi @enzingerm sorry it took so long, I addressed your review comments:

I'll eventually get around to the other stuff we discussed, but it's off-topic for this PR, so it may take a while

enzingerm commented 3 years ago

Thanks, I merged it, made a few adaptions and published it to the AUR (https://aur.archlinux.org/packages/snapborg/).

For further discussion we should stick to opening separate Issues/PRs. Generally, I'm very keen on improving this "alpha-state" project so that it actually could be useful for different use-cases and other people :)