Pruning old snapshots - Githubissues

ngharo commented 4 years ago

I modeled a program closely after borg backup prune command to clean up old zackup snapshots on my server.

I'm considering adding a zackup prune command. It would read the config for daily, weekly, monthly and yearly "keep counts" and prune snapshots accordingly. I want to make sure that this is a feature you all would accept before I start working on the integration.

I can elaborate on more details if interested.

corny commented 4 years ago

Oh yes, please provide some details.

dmke commented 4 years ago

@ngharo, yes, this is a very much missing feature that fell off my TODO list.

I'd like to see the retention policy to be configured per host, with defaults inherited by the global config (pretty much as the rsync and ssh config propagates).

I don't know whether the zackup prune command should have CLI flags to configure the retention on the fly... I believe it should execute the truncation/pruning according to the plan laid out by the global/host config. (I foresee accidental deleting the wrong data when juggling the command line arguments — which is really bad when that data is already the backup...)

When thinking about this feature, I've written down some obstacles somewhere... Let me report back here when I've looked through my notes at work (tomorrow).

ngharo commented 4 years ago

I'd like to see the retention policy to be configured per host, with defaults inherited by the global config (pretty much as the rsync and ssh config propagates).

Agreed! That is something I want to do.

The config I envisioned would look like:

ssh:
   ...
rsync:
   ...
retention:
   yearly: 5
   monthly: 6
   weekly: 4
   daily: 7

Each number describes the number of snapshots to keep at each given interval.

I don't know whether the zackup prune command should have CLI flags to configure the retention on the fly... I believe it should execute the truncation/pruning according to the plan laid out by the global/host config. (I foresee accidental deleting the wrong data when juggling the command line arguments — which is really bad when that data is already the backup...)

Also agree. I think there should be one source of truth, the config. The prune command would be for people not running as a daemon. As a daemon, like BackupPC, it probably would make sense to run the prune automatically when idle, maybe right after backups complete.

Originally, I wanted to port BackupPCs exponential expiration over, but I'm having problems grokking it and I'm fairly new to the golang. Even as a user, I find it a little confusing and not sure if it's worth investing effort into vs a simplified approach where simple "keep" counts are used (again, modeled after borg backup pruning).

dmke commented 4 years ago

Each number describes the number of snapshots to keep at each given interval.

Ah, that's a nicer definition than mine: I had envisioned some kind of time bucket list in the form of

retention:
- { interval:  "24h", keep:  7 } # 7 daily backups
- { interval:  "48h", keep: 14 } # 14 bi-daily backups
- { interval:   "7d", keep:  4 } # 4 weekly backups
- { interval:  "30d", keep: 12 } # 12 monthly backups
# for the "rest", either:
- { interval: "360d", keep: ∞ } # keep the rest with one-year gaps
# or
- { interval: "360d", keep: 10 } # 10 yearly backups, delete anything older

where interval is fed into a time.Parse equivalent which interprets 1d as 24h, allowing for arbitrary buckets. Having predefined buckets makes both the configuration and implementation much easier.

Sidenote

This also allows upgrading the definition (should this ever be needed), as your config example can be easily re-modeled as ```yaml retention: - { interval: "365d", keep: 5 } # == yearly: 5 - { interval: "30d", keep: 6 } # == monthly: 6 - { interval: "7d", keep: 4 } # == weekly: 4 - { interval: "24h", keep: 7 } # == daily: 7 ```

it probably would make sense to run the prune automatically when idle, maybe right after backups complete.

I concur. Parallel creating new snapshots and deleting old ones while an rsync is happening sounds like a lot of load for the ZFS ARC, which should be avoided.

Two notes I have found:

How are the retention buckets stacked?

They can either be consecutive (i.e. bucket i+1 starts after bucket i ends), or they all start simultaneously. The latter is easier to implement, but leads (using your config from above), to the phenomenon that the weekly: 4 bucket is actually only 3 weeks long, because the first week is occupied by the daily: 7 bucket. The former leads to shifting each bucket further in time (the yearly: 5 bucket would actually cover a time range of more than 5½ years):

bucket-stacking

(This is just a matter of definition+documentation. There's no right or wrong here.)

How do we handle rotating a snapshot from one bucket to the next?

This is a purely algorithmic problem: matching a list of snapshots (with creation timestamp) to the bucket-list. I've matched a drawing to your configuration (same color scheme as above):

bucket-aging

Here, we start with 6 daily backups (a, b, c, d, e and f).
1d later we create backup g. The oldest daily backup (a) is not yet in the weekly bucket.
That happens the next day (2d on the y axis), where a "rolls into" the next bucket.
At that point the first weekly-bucket is empty, so a stays.
On day 3, we create backup i, and b rolls into the first weekly-bucket (which is still occupied by a). So b gets deleted.
This continues until day 9, where a rolls into the 2nd weekly-bucket and frees the 1st bucket for h.

I might have overlooked something, but this should also cover the case when backups are created more than once daily (the scale is just smaller).

Rolling from the weekly-bucket into the monthly-bucket applies the same principle.

It should also gracefully handle the case, where a backup is missing (which would be represented as a "hole" in the drawing).

ngharo commented 4 years ago

Wow! Thanks for the feedback.

Let me know what you think of #3 so far. You can see how it wouldn't allow for arbitrary time durations from the user and how all buckets start simultaneously. It's really stupid simple (maybe too simple...). It's a straight port of how borg backup does pruning. I thought it was really clever use of time string formatting.

dmke commented 4 years ago

@ngharo, how's it coming? Do you need help?

ngharo commented 4 years ago

Hey @dmke. I haven't had a lot of time to sit down and focus on this. Crazy days we're living in. Hope to get back in to it soon.

Hope you and yours are doing well

dmke commented 4 years ago

Crazy days indeed. Don't worry too much about this project, it's not important at all.

digineo / zackup

Pruning old snapshots #2