borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.96k stars 740 forks source link

Feature: "--pre-stats" option for create #4131

Open sophie-h opened 5 years ago

sophie-h commented 5 years ago

I am currently using the output of create --log-json --list --dry-run --filter=- to estimate the size of the backup. Therefore I have to parse many JSON rows and stat the file afterwards. Generating the JSON in borg and parsing it again somewhere else seems like a pretty unnecessary round trip. I was thinking of a --pre-stats option, that outputs something like the total number of files and the total size.

I am using this to show the user a progress bar. I know that the progress might be highly non-linear due to deduplication. However, in practice it still seems to provide a much better user experience.

ThomasWaldmann commented 5 years ago

How is the desired functionality useful? If you just add up all the files sizes, you do not consider compression nor deduplication. Also, it is an extremely expensive operation to compute just for showing a progress bar, which, on a life filesystem, could be quite off due to fs changes also.

So, how about ditching the progress bar idea (like we did) and just show what is known (== how much was already processed) instead of trying to predict the future?

https://xkcd.com/612/

janprzy commented 4 years ago

I can see this being useful if you want to make sure you're getting maximum use out of the backup target, i.e. ensure there's a minimum amount of unused space. With this option, you would be able to prune just enough old backups to make space for the new one.

I think making the existing --stats option work in conjunction with --dry-run instead of adding --pre-stats would also solve this, like #1648

ThomasWaldmann commented 4 years ago

I don't see a way to do a good estimation of backup size without actually doing it (or doing a similar heavy / time consuming operation).

This is mostly due to compression and deduplication.

Of course, we could count the input files and sum up their sizes (as in #1648), but as we always do full backups, this has no meaningful connection to the resulting backup size (== increase in backup storage needs).

janprzy commented 4 years ago

In that case I think the only solution would be adding an option "--prune" to the create command. This could dynamically prune old backups while creating a new one. However, I don't know how exactly borg is implemented, so this may very well be way too much work for something that maybe only a handful of users need/want.

This seems to be related: #4902