borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.22k stars 743 forks source link

borg recreate questions #8406

Closed SpiritInAShell closed 1 month ago

SpiritInAShell commented 1 month ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Question

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

> borg --version
borg 1.4.0

Describe the problem you're observing.

I ask to better understand this topic (to determine if the time spent is actual worth it)

I did backups from my desktop machine with zst,6 (best balance because of a) slow/old CPU, b) targetting short runtime duration, c) somewhat low bandwidth to server).

Now I consider to (paranoically) recompress with zstd,22. (It is a running experiment; and from other repositories I do know that there is some gain in zstd,22 over zstd,6!).

Part A:

When running borg recreate with the goal of filtering out files and directories from an archive, the --target option is clear in use: It will create an additional archive only containing the files not filtered out.

But when the only change is --compression=... --recompress=always, is there anything achieved by --target?

From my understanding, there will be always only 1 compressed block/file stored, not an "old compressed block + new compressed block"?

Part B:

Also, if the last assumption (that there will be only 1 "block/compression" stored), is true:

Let's say each archive of the desktop machine is about 20GiB adding 50-500MiB per every following archive creation:

what does --recompress do, when running for every archive (oldest to newest)? To me it seems (guessing):

As about 20GiB are the same from archive1 to archive2,3,4... every --recompress run compresses the exact same files over and over again?

If that is true, maybe I should just run a --recompress on the first and the last archive, and I guess I would have recompressed most of the actual data with just 2 runs.

Part B2:

If I have understood correctly, I should maybe just wait until borg2 is final, as transferring allows recompressing, and as borg2 rcompact seems to address this subject by recompressing blocks in the repo instead of files.

ThomasWaldmann commented 1 month ago

A) the key into the key/value store is H(plaintext). that means, that borg can only store 1 value per plaintext (no matter how it is compressed).

B) borg will go over every archive, over every file, over every chunk in the file. but it will usually only recompress if it detects the need to (which means for borg1: a different compression algorithm, because it does not remember the level). but if you give "always", it will recompress always.

B2) yes, recompressing in transfer will be more efficient. also, borg2 repo-compress will be more efficient also, because it will only go 1 pass over all chunks in the repo and not go over all archives and check the same chunks repeatedly.