borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11k stars 739 forks source link

enhance borg2 compact stats #8410

Open ThomasWaldmann opened 5 days ago

ThomasWaldmann commented 5 days ago

When building a ChunkIndex it currently starts from refcount=0 and then sets refcount=MAX_VALUE if a chunk is used.

That's how most of borg2 works now: it doesn't do refcounting anymore, just a boolean "do we have chunk X".

For better deduplication stats in borg compact, we could deviate from that in just borg compact and do precise refcounting without any additional effort.

Before persisting the ChunkIndex, we then need to set refcounts to MAX_VALUE, similar as we clean up the size values.

To consider:

ThomasWaldmann commented 4 days ago

Comment about what's interesting for practical usage: https://github.com/borgbackup/borg/issues/122#issuecomment-125700186

awgcooper commented 3 days ago

This would definitely be useful: https://github.com/borgbackup/borg/issues/122#issuecomment-125915021

Question: if compression and/or obfuscation is enabled, would the size stats be given for the native file, pre-compression etc?

awgcooper commented 3 days ago

Something else, not sure if it relates specifically to this: let's say I have a specific file backed up. I know this because it appears in a list contents of the most recent archive. Let's say I wanted to eliminate this file from the whole repo, how would I do that. Would I simply delete the first instance of it being backup up and by doing so that would automatically eliminate all dedups? If so, how would I find it? Fusermout?

ThomasWaldmann commented 2 days ago

@awgcooper No, it does not work like that.

But you can use borg recreate to rewrite all the archives that contain the unwanted file (or the directory). Just be very careful with that and first use --dry-run --list to see if it does what you want.