Closed shadowrylander closed 5 years ago
Every borg archive is a FULL backup that has all files in the input data set.
So, even if nothing changed, borg will create a stream of file metadata for all the files (and from within this metadata, it will reference all the content chunks [which are already in the repo, if nothing changed]).
Ideally, this whole stream of metadata will fully dedup against the previous stream of metadata (from the previous backup). You'll see a dedup size of only a few kB or so in that ideal case.
But, it can happen rather easily, that some metadata changes and spoils the deduplication.
E.g. if you access the files and the atime changes, then you'll get a slightly different metadata stream that dedups badly. You can try the --noatime
option if you do not need to save the atime. The 2nd backup after changing to --noatime
might dedup better (if atime really was the culprit).
Ah; so the repo increasing from 11 to 29 megabytes after 83 or so runs, with no changes or access to the source, seems about right? I may have miscalculated how many times I ran the backup! 😅😅😅😅
I'ld expect less growth if the metadata dedup works.
you can also use borg info repo::archive
to check the deduped size, it should be rather tiny if there were no changes.
Apparently not; every time I make a backup, the deduplicated size increases by 3 kilobytes, so the last three backups had this archive
deduplicated sizes of 358.24, 361.56, & 363.84
. Again, no changes or accesses. However, the all archives
deduplicated size remains a steady 28.97 megabytes
throughout the three.
3kB or 300kB?
You know, at this point I'm not entirely sure... I checked the this archive
deduplicated sizes for the last 5 backups, and the first two were by around 1 kb, and the last three were by 300 kb! I legitimately don't know what's going on. Can I send you a pastebin
of all the info about the sizes, i.e. is there a way to check all the this archive
sizes at once, barring json manipulation?
1kb sounds good, 300kb not so much.
try borg diff maybe?
Ah, I believe ownership of the files are changing, as I using a docker container as well; they are switching between the root
user in the container whenever I'm backing up via docker, and my user whenever I'm backing up via WSL. Is there a way to ignore the ownership?
No.
On June 2, 2019 12:50:52 AM GMT+02:00, Jeet Ray notifications@github.com wrote:
Ah, I believe ownership of the files are changing, as I using a docker container as well; they are switching between the
root
user in the container whenever I'm backing up via docker, and my user whenever I'm backing up via WSL. Is there a way to ignore the ownership?-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/borgbackup/borg/issues/4594#issuecomment-497983478
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Fair enough! Are there any other factors I should keep in mind regarding deduplication, aside from the cache and file modification?
The file metadata ends up in the metadata stream - so having that dedup nicely requires the archived metadata to not change (not possible with ownership changing, atime could be ignored, bsdflags could also be ignored).
The file content data ends up in the content chunks, they'll dedup nicely if the content does not or only little change. Widespread "sprinkling" of little changes over a huge file can spoil that dedup process.
That's about it I guess.
Perfect! Then I've all bases covered. Thank you kindly for help!
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
Question
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
Operating system (distribution) and version.
https://hub.docker.com/r/pschiffe/borg/
Hardware / network configuration, and filesystems used.
How much data is handled by borg?
20 Megabytes; file sizes are in bytes and kilobytes.
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg create /s/borg/test/repo::test.2019.05.29.00.20.28 /s/borg/test/source --comment test.2019.05.29.00.20.28 --stats --progress --compression auto,zstd,22 --chunker-params 10,23,16,10
Describe the problem you're observing.
Every time I run the command, the repo increases in size by a megabyte or so, despite nothing having changed; I am keeping the same
chunker parameters
, as well as compression, and using the same cache directory. Is there anything I'm missing regarding the deduplication mechanism?Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yes; the same command repeatedly.
Include any warning/errors/backtraces from the system logs
N/A