Open ngharo opened 4 years ago
Oh yes, please provide some details.
@ngharo, yes, this is a very much missing feature that fell off my TODO list.
I'd like to see the retention policy to be configured per host, with defaults inherited by the global config (pretty much as the rsync and ssh config propagates).
I don't know whether the zackup prune
command should have CLI flags to configure the retention on the fly... I believe it should execute the truncation/pruning according to the plan laid out by the global/host config. (I foresee accidental deleting the wrong data when juggling the command line arguments — which is really bad when that data is already the backup...)
When thinking about this feature, I've written down some obstacles somewhere... Let me report back here when I've looked through my notes at work (tomorrow).
I'd like to see the retention policy to be configured per host, with defaults inherited by the global config (pretty much as the rsync and ssh config propagates).
Agreed! That is something I want to do.
The config I envisioned would look like:
ssh:
...
rsync:
...
retention:
yearly: 5
monthly: 6
weekly: 4
daily: 7
Each number describes the number of snapshots to keep at each given interval.
I don't know whether the
zackup prune
command should have CLI flags to configure the retention on the fly... I believe it should execute the truncation/pruning according to the plan laid out by the global/host config. (I foresee accidental deleting the wrong data when juggling the command line arguments — which is really bad when that data is already the backup...)
Also agree. I think there should be one source of truth, the config. The prune command would be for people not running as a daemon. As a daemon, like BackupPC, it probably would make sense to run the prune automatically when idle, maybe right after backups complete.
Originally, I wanted to port BackupPCs exponential expiration over, but I'm having problems grokking it and I'm fairly new to the golang. Even as a user, I find it a little confusing and not sure if it's worth investing effort into vs a simplified approach where simple "keep" counts are used (again, modeled after borg backup pruning).
Each number describes the number of snapshots to keep at each given interval.
Ah, that's a nicer definition than mine: I had envisioned some kind of time bucket list in the form of
retention:
- { interval: "24h", keep: 7 } # 7 daily backups
- { interval: "48h", keep: 14 } # 14 bi-daily backups
- { interval: "7d", keep: 4 } # 4 weekly backups
- { interval: "30d", keep: 12 } # 12 monthly backups
# for the "rest", either:
- { interval: "360d", keep: ∞ } # keep the rest with one-year gaps
# or
- { interval: "360d", keep: 10 } # 10 yearly backups, delete anything older
where interval
is fed into a time.Parse
equivalent which interprets 1d
as 24h
, allowing for arbitrary buckets. Having predefined buckets makes both the configuration and implementation much easier.
it probably would make sense to run the prune automatically when idle, maybe right after backups complete.
I concur. Parallel creating new snapshots and deleting old ones while an rsync is happening sounds like a lot of load for the ZFS ARC, which should be avoided.
Two notes I have found:
They can either be consecutive (i.e. bucket i+1 starts after bucket i ends), or they all start simultaneously. The latter is easier to implement, but leads (using your config from above), to the phenomenon that the weekly: 4
bucket is actually only 3 weeks long, because the first week is occupied by the daily: 7
bucket. The former leads to shifting each bucket further in time (the yearly: 5
bucket would actually cover a time range of more than 5½ years):
(This is just a matter of definition+documentation. There's no right or wrong here.)
This is a purely algorithmic problem: matching a list of snapshots (with creation timestamp) to the bucket-list. I've matched a drawing to your configuration (same color scheme as above):
I might have overlooked something, but this should also cover the case when backups are created more than once daily (the scale is just smaller).
Rolling from the weekly-bucket into the monthly-bucket applies the same principle.
It should also gracefully handle the case, where a backup is missing (which would be represented as a "hole" in the drawing).
Wow! Thanks for the feedback.
Let me know what you think of #3 so far. You can see how it wouldn't allow for arbitrary time durations from the user and how all buckets start simultaneously. It's really stupid simple (maybe too simple...). It's a straight port of how borg backup does pruning. I thought it was really clever use of time string formatting.
@ngharo, how's it coming? Do you need help?
Hey @dmke. I haven't had a lot of time to sit down and focus on this. Crazy days we're living in. Hope to get back in to it soon.
Hope you and yours are doing well
Crazy days indeed. Don't worry too much about this project, it's not important at all.
I modeled a program closely after borg backup prune command to clean up old zackup snapshots on my server.
I'm considering adding a
zackup prune
command. It would read the config for daily, weekly, monthly and yearly "keep counts" and prune snapshots accordingly. I want to make sure that this is a feature you all would accept before I start working on the integration.I can elaborate on more details if interested.