Bad data deduplication - Githubissues

IngoMeyer441 commented 1 year ago

Thanks for your very helpful script. 😊 I tried vzborg in multiple runs with the same virtual machine but Borg's data deplucation doesn't seem to reduce the size of snapshots at all and I wonder about the reasons. In my tests I tried to backup a NAS VM with a virtual hard disk for NAS data which doesn't change a lot. A compressed snapshot has 30.08 GB and the deplucatated size is still 30.07 GB. I don't really see the reason for this since vzborg uses uncompressed vma files as input which should work quite well. Did you observe bad data deduplication as well?

Eeems commented 1 year ago

https://borgbackup.readthedocs.io/en/stable/deployment/image-backup.html#decreasing-the-size-of-image-backups

IngoMeyer441 commented 1 year ago

@Eeems Thanks for the hint. Zeroing unused chunks is a good idea in general, but I think that this is not the cause of my problem. If a deduplicated snapshot has the same size as the compressed snapshot that either means:

It is the first snapshot of this data or
the data changed completely.

I created snapshots before, so it must be the second point. However, I have no idea why every chunk in the Borg repo has changed. The VM references a virtual hard drive of 1 TB size with ext4 file system. AFAIK, there is no intermediate tool involved, which could change all the data blocks (like a compression program).

IngoMeyer441 commented 1 year ago

Ok, I could figure out the reason for the non-working deduplication: I used the default Borg chunk size which is too large for the Proxmox VMA backup format. The format uses blocks of 4 MiB size with a UUID header field which changes in every backup run and the targeted default chunk size of Borg is 2 MiB, so chunks cannot be cut properly. Now I used the vzborg default value of '9,16,12,4095' which is a much better fit for the backup format (512 Bytes min chunk length, 64 KiB max chunk length and 4 KiB targeted chunk length, since VMA backup data is organized in 4KiB blocks). With these settings I can reduce my snapshot size to 5.45 GB (18 % of the compressed size) which is not optimal but much better.

Maybe the vzborg default file should contain a hint why a custom chunk size should be used.

More details about the chunk size issue for VMA files can be found here: https://forum.proxmox.com/threads/backup-deduplication.57459/post-265637.

g3492 / vzborg

Bad data deduplication #19