Tarsnap / tarsnap

Command-line client code for Tarsnap.
https://tarsnap.com
Other
865 stars 60 forks source link

Deleting of archives taking up a lot of server -> client bandwidth #557

Closed masta79 closed 1 year ago

masta79 commented 1 year ago

I'm cleaning up a large bunch of old hourly backups and run into the problem that the deletion per archive takes 10-15 minutes, and while reducing the required storage by only 1.5GB, the total server -> client bandwidth for this operation was about 15GB. Is this expected?

Is there a more optimized way to delete large bulks of backups, i could not see a difference in deleting them by calling tarsnap -d once per archive, or listing all at once with multiple -f arguments.

cperciva commented 1 year ago

A single tarsnap command deleting multiple archives can be much faster (and use less bandwidth) than separate commands, since it keeps some metadata cached. For optimal performance (aka to make the cache as efficient as possible), sort the list of archives so that archives which share a lot of their contents are deleted consecutively.

gperciva commented 1 year ago

In the case of hourly backups, "sort the list of archives so that the archives which share a lot of their contents" almost always means "sort them by date&time". And unless you have a very weird naming scheme, that means "sort them alphabetically".

Our official page about deleting multiple archives faster is https://www.tarsnap.com/improve-speed.html#faster-delete but that doesn't contain anything that @cperciva didn't mention. (and fact, it doesn't include the tip about sorting them, so I'll add that)

masta79 commented 1 year ago

Thank you, I completely missed that part of the documentation, as in my mind "improve speed" was only associated with doing actual backups. I'll run the deletion again and will report back.

Sidenote: It might be beneficial to have a textfile input for the list of archives.

gperciva commented 1 year ago

Hi @masta79,

Sidenote: It might be beneficial to have a textfile input for the list of archives.

That's the --archive-names option, added in 1.0.38 (2017).

https://www.tarsnap.com/man-tarsnap.1.html

masta79 commented 1 year ago

Sorry for wasting your time, sorry for not reading the documentation properly. Deletion now went through a lot faster, I'll update my scripts to use the options.

/me sees himself out

gperciva commented 1 year ago

Hi @masta79, no problem! Tarsnap has a lot of nice options, but I'm still working on "discoverability", in terms of trying to make sure that people know how to find the info they need, without being overwhelmed by info that's not relevant to them at the moment.

Interactions such as this helps to guide me towards writing better docs. :)