ThomasWaldmann commented 9 years ago

There is some danger that bitrot and storage media defects could lead to backup data loss / repository integrity issues. Deduplicating backup systems are more vulnerable to this than non-deduplicating ones, because a defect chunk affects all backup archives using this chunk.

Currently, there is a lot of error detection (CRCs, hashes, HMACs) going on in borgbackup, but it has no built-in support for error correction (see the FAQ about why), but it could be solved maybe using one of these approaches:

use borg to have N (N>1) independent backup repos of your data on different targets (if N-1 targets get corrupt then, you have still 1 working left. note that there is no support to create one non-corrupt repo from 2 corrupt repos, although that might be theoretically possible for some cases).
snapraid
par2
FECpp https://github.com/randombit/fecpp (BSD, C++ - make available via Cython?)
zfec (GPL/TGPPL, Python 2.x only, PR for Python 3.x exists)
RAID (and monitor and scrub the disks), ZFS mirror or RAIDZ* (better not use raid5 or raidz1)
zfs copies=N option (N>1)
specific filesystems
ceph librados
https://github.com/Bulat-Ziganshin/FastECC

If we can find some working approaches, we could add them to the documentation. Help and feedback about this is welcome!

dumblob commented 4 years ago

@tarruda would you have a link to your wrapper? Thanks!

tarruda commented 4 years ago

@dumblob My wrapper has quite a bit more functionality than data recovery (which I still haven't implemented BTW). This is a summary of what my wrapper (~= 350 LOC python script) does:

Mostly configuration driven (I use python's own configparser module).
Allows maintaining multiple repositories and archives.
Integrates with rclone. Personally I use this to store repositories in dropbox/onedrive/gdrive.
Manages repository keys, ssh keys and rclone configuration encryption. I do this because I don't like using secret-tool (gnome keyring) to cache passphrase, as it becomes easy to see from any process ran as the current desktop user. Instead I use a combination of systemd-ask-password and keyctl to cache the passphrase in the root user keyring. I also use this passphrase to encrypt the ssh key (one of my repositories is stored remotely via ssh) and rclone config (it contains credentials for my dropbox account). This setup allows me to type the passphrase once every 24h (the expiry time I set on keyctl) and reuse the same passphrase for ssh/rclone/borg.

I haven't published because I designed it with my own use of borg in mind, which is probably more complex than what most users need. Still, if you think it would be useful (as I said, I haven't implemented @elho's suggestion yet) I can create a project on github and publish it to pypi.

dumblob commented 4 years ago

Most of the points apply to my use case as well. So, if you don't mind, feel free to create a repo (pypi record would be then an icing on the cake). I'll consider adjusting (or rewriting) it to work on Windows as well.

tarruda commented 4 years ago

I should be able to do this later today, I will post a link here when it is ready.

tarruda commented 4 years ago

@dumblob https://github.com/tarruda/syborg

I might implement the recovery wrapper suggested by @elho next week.

dumblob commented 3 years ago

@tarruda did you manage to write some FEC code for Borg? I couldn't find it (did you forget to push some downstream branches?).

G2G2G2G commented 2 years ago

https://github.com/rfjakob/cshatag & https://packages.debian.org/stretch/shatag etc while bitrot isn't really an issue, especially since todays hdds pretty much prevent it, it's extremely easy to check/watch for and always has been.

dumblob commented 2 years ago

while bitrot isn't really an issue, especially since todays hdds pretty much prevent it

Even if I strongly disagree with this bold & broad (and evidence-based incorrect) statement, could we at least agree on the fact that the severity (i.e. the real cost if it happens) of bitrot is extremely high even if its incidence rate is low?

If so, could we at least push very hard for minimizing such damage to as few data as possible (considering worst case - i.e. when bitrot appeared in the most important data - e.g. some central metadata about the structure of our backups)?

enkore commented 2 years ago

https://github.com/rfjakob/cshatag & https://packages.debian.org/stretch/shatag etc

That's only checksumming - Borg has multiple layers of those already.

G2G2G2G commented 2 years ago

I'm aware, and it still doesn't detect bitrot, which is the problem. I was showing how insanely easy it has been for decades.

ThomasWaldmann commented 2 years ago

6584 is much easier / simpler, but might often solve the same problem.

Sepero commented 1 year ago

I'm currently in the process of finding a new backup solution right now. Lack of Parity is the one thing preventing me from setting up Borg immediately. Bup appears to be the only backup program that supports parity.

Recently I had a backup drive fail, leaving me with only 1 backup drive. I'm still in process of getting the second backup re-established. But in the meantime, I feel uneasy. I have to place full trust and confidence in the resilience of one backup drive. 1 backup is good, but 1 backup with a little parity is way Way Better. As Terabytes of backup data increases, the probability of a bad bit or corrupt sector increases.

A file with corrupted 65536 bytes can be restored with 1) A little parity OR 2) Another full copy

I'm inclined to think having a little parity AND another full copy is a pretty good option..

DavidOliver commented 1 year ago

@Sepero, in case it's of interest, Duplicacy offers erasure coding.

ThomasWaldmann commented 1 year ago

It's not just the backup tool having feature X (like parity, ECC, erasure encoding, etc.), but also whether it really helps in practice.

If you use USB disk(s), you really should have multiple ones and rotate them - then you don't need stuff like ECC because you have N-times redundancy anyway. If you backup to some remote server, just have multiple independent remote servers. Or combine local and remote.

That will help you if one backup media goes away (dies due to age, dropping it, you lose it / it gets stolen, a crypto trojan encrypts it, server issues, provider gone, ...) - ECC would not help you at all with that.

ECC also does not help you for any case that goes beyond what it was designed for - e.g. if there is just a bit more corruption than it can deal with. This can be a real problem if there is no control over data distribution on the media (like e.g. you can't know where a flash controller will put your data in a flash chip, what's close to each other and what's not).

BTW, HDD and SSD controllers usually internally already use ECC codes for whatever they are useful for.

Sepero commented 1 year ago

An additional resource that may be useful for devs

Backblaze Open-sources Reed-Solomon Erasure Coding Source Code

borgbackup / borg

evaluate redundancy / error correction options #225

6584 is much easier / simpler, but might often solve the same problem.