Open ThomasWaldmann opened 9 years ago
@tarruda would you have a link to your wrapper? Thanks!
@dumblob My wrapper has quite a bit more functionality than data recovery (which I still haven't implemented BTW). This is a summary of what my wrapper (~= 350 LOC python script) does:
secret-tool
(gnome keyring) to cache passphrase, as it becomes easy to see from any process ran as the current desktop user. Instead I use a combination of systemd-ask-password
and keyctl
to cache the passphrase in the root user keyring. I also use this passphrase to encrypt the ssh key (one of my repositories is stored remotely via ssh) and rclone config (it contains credentials for my dropbox account). This setup allows me to type the passphrase once every 24h (the expiry time I set on keyctl) and reuse the same passphrase for ssh/rclone/borg.I haven't published because I designed it with my own use of borg in mind, which is probably more complex than what most users need. Still, if you think it would be useful (as I said, I haven't implemented @elho's suggestion yet) I can create a project on github and publish it to pypi.
Most of the points apply to my use case as well. So, if you don't mind, feel free to create a repo (pypi record would be then an icing on the cake). I'll consider adjusting (or rewriting) it to work on Windows as well.
I should be able to do this later today, I will post a link here when it is ready.
@dumblob https://github.com/tarruda/syborg
I might implement the recovery wrapper suggested by @elho next week.
@tarruda did you manage to write some FEC code for Borg? I couldn't find it (did you forget to push some downstream branches?).
https://github.com/rfjakob/cshatag & https://packages.debian.org/stretch/shatag etc while bitrot isn't really an issue, especially since todays hdds pretty much prevent it, it's extremely easy to check/watch for and always has been.
while bitrot isn't really an issue, especially since todays hdds pretty much prevent it
Even if I strongly disagree with this bold & broad (and evidence-based incorrect) statement, could we at least agree on the fact that the severity (i.e. the real cost if it happens) of bitrot is extremely high even if its incidence rate is low?
If so, could we at least push very hard for minimizing such damage to as few data as possible (considering worst case - i.e. when bitrot appeared in the most important data - e.g. some central metadata about the structure of our backups)?
https://github.com/rfjakob/cshatag & https://packages.debian.org/stretch/shatag etc
That's only checksumming - Borg has multiple layers of those already.
I'm aware, and it still doesn't detect bitrot, which is the problem. I was showing how insanely easy it has been for decades.
I'm currently in the process of finding a new backup solution right now. Lack of Parity is the one thing preventing me from setting up Borg immediately. Bup
appears to be the only backup program that supports parity.
Recently I had a backup drive fail, leaving me with only 1 backup drive. I'm still in process of getting the second backup re-established. But in the meantime, I feel uneasy. I have to place full trust and confidence in the resilience of one backup drive. 1 backup is good, but 1 backup with a little parity is way Way Better. As Terabytes of backup data increases, the probability of a bad bit or corrupt sector increases.
A file with corrupted 65536 bytes can be restored with 1) A little parity OR 2) Another full copy
I'm inclined to think having a little parity AND another full copy is a pretty good option..
@Sepero, in case it's of interest, Duplicacy offers erasure coding.
It's not just the backup tool having feature X (like parity, ECC, erasure encoding, etc.), but also whether it really helps in practice.
If you use USB disk(s), you really should have multiple ones and rotate them - then you don't need stuff like ECC because you have N-times redundancy anyway. If you backup to some remote server, just have multiple independent remote servers. Or combine local and remote.
That will help you if one backup media goes away (dies due to age, dropping it, you lose it / it gets stolen, a crypto trojan encrypts it, server issues, provider gone, ...) - ECC would not help you at all with that.
ECC also does not help you for any case that goes beyond what it was designed for - e.g. if there is just a bit more corruption than it can deal with. This can be a real problem if there is no control over data distribution on the media (like e.g. you can't know where a flash controller will put your data in a flash chip, what's close to each other and what's not).
BTW, HDD and SSD controllers usually internally already use ECC codes for whatever they are useful for.
An additional resource that may be useful for devs
Backblaze Open-sources Reed-Solomon Erasure Coding Source Code
There is some danger that bitrot and storage media defects could lead to backup data loss / repository integrity issues. Deduplicating backup systems are more vulnerable to this than non-deduplicating ones, because a defect chunk affects all backup archives using this chunk.
Currently, there is a lot of error detection (CRCs, hashes, HMACs) going on in borgbackup, but it has no built-in support for error correction (see the FAQ about why), but it could be solved maybe using one of these approaches:
If we can find some working approaches, we could add them to the documentation. Help and feedback about this is welcome!