borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.04k stars 738 forks source link

Tagging archives for "prune"? #846

Open enkore opened 8 years ago

enkore commented 8 years ago

Currently borg prune can only restrict the archives to be pruned by a common prefix. This works for naming schemes were the "prune-relevant" part of archive names is in the front, e.g. system-<hostname>-<date> and userdata-<hostname>-<date>, but doesn't really work for anything else.

Adding tags, i.e. a list of arbitrary strings (excluding "," which would be the tag separator) would help. "prune" and other commands using "--prefix" would get a "--tags" option, and only archives which have all (or any, discuss) tags listed would be affected (and they should be immutable for this reason).


EDIT: Different approach maybe, no extra metadata fields, backwards-applicable.

The names are already there. Most people probably already have some kind of "tag-gy" names, like those above or yyyy-mm-dd-hostname-part. We could just add something like --tags some,tags (always use , as delimiter here?) and --tag-delim - (what delim as default?). Then in stuff like prune:

tags = set(args.tags.split(args.tag_delim))
for archive in ...:
  if set(archive.name.split(args.tag_delim)) <= tags:
    ...  # prune
ThomasWaldmann commented 8 years ago

Mixing names and tags feels unclean. Tags could be separate archive metadata.

enkore commented 8 years ago

Good point, but I'm unsure whether that's not okay here (as a design decision). #866 made me think "Hm, what is the archive name really for?". "Recycling" it for tagging isn't a really clean thing to do, but it seems quite practical to me (if it's 100 % explicit opt-in). In a way "tags" would just be a different way of looking at the "name" field.

billyc commented 7 years ago

I'd like to bump this feature request for tags/aliases.

After spending so much time in the git universe, I find myself wishing I could apply additional tags to specific borg archives.

Embedding tags in the archive name is currently possible, but it's quite unruly when you want to use multiple tags for an archive. For example, I already use the archive name to embed hostname, timestamp, and one or two other fields. I also want to add additional tags such as "@latest" and "@release-1". This gets messy quickly. Worse, I sometimes want to move a tag such as @latest from one archive to another.

If you're just using borg to backup files (granted, its original mission), there probably isn't a lot of need for tags. But if, like me, you have found borg's deduplication to be massively useful in other situations, like archiving very large files used in a data analysis pipeline :-) then the ability to assign multiple tags to an existing archive becomes really important.

Currently, my work-around is to create the original archive with the naming scheme I've devised, and then to immediately create multiple additional archives with names that begin with "@" -- @latest, @v1.0, @beta2, etc. Each one of those additional archives takes a couple minutes to scan/create, and adds just a few hundred bytes to the repository since the contents are completely identical to the original archive. (Well, as long as the files haven't changed in those couple minutes.)

It would be really nice to eliminate that slowdown by adding tag metadata.

I envision the UI being something like this:

Thanks for considering this!

ChrisDowning commented 7 years ago

Only started trying out borg recently but wanted to +1 the tagging idea. I can see a use-case relevant to backups whereby tags are used to define which of multiple cloud services an archive is backed up to. I imagine (based on other discussions) that the cloud backup would most likely be via a separate tool which picks up on the tags and, for example, handles creating a *.tgz file to be uploaded. (You could even add backup frequency as a separate detectable tag, but that sort of thing would be within the scope of the backup tool rather than borg itself.)

billyc commented 7 years ago

See issue #2300 for a possible tag implementation. It's currently more like git tag than like Gmail labels -- in other words, additional aliases can exist for an archive, but they need to be unique. It might not be hard to merge that idea with what's discussed here -- labels applied to multiple archives.

ThomasWaldmann commented 5 days ago

See also #8425.