Roman2K / scat

Decentralized, trustless backup tool
https://github.com/Roman2K/scat/issues/1
MIT License
92 stars 9 forks source link

New proc for rebuilding missing data/parity shards from old snapshots on new stores #11

Open Roman2K opened 7 years ago

Roman2K commented 7 years ago

Currently, uparity recovers errors at restore time (failed integrity check, missing data) but in read-only mode: restored data is intact, but stores still contain bad or missing data.

On a consecutive backup, parity will create missing data and/or parity shards on new stores, for the newest data being piped in. But bad/lost shards from previous backups aren't recovered (rebuilt + stored) on new stores.

Add a new proc parscrub that rebuilds missing shards like uparity also writes them to specified stores, reusing data from previous stores, and writing to new stores so as to meet the min and excl requirements.


From question/request/suggestion by @Technifocal on reddit (comment):

@Technifocal:

Third question, say I lose a store (provider/hard-drive/whatever), how do reshard/rebalance my data, either to the remaining stores, or by replacing the store? I understand new backups going forward will be correctly balanced, but what about my backlog of backups?

@Roman2K:

When you lose a store, you would do the equivalent of replacing a disk in a RAID array: in stripe() proc of the backup script, replace the line of the defective store with a new one, and re-run the backup. Chunks will be written to the new store in such a way as to satisfy the min and excl requirements reusing chunks from old stores.

@Technifocal:

I haven't done any testing yet (Sorry!), but I think you missed the point of my question, I'll try and explain below:

  1. I have a directory, foobar, with two files in it, foo and bar. They are unique, no deduplication can occur.
  2. I backup foobar with parity 2 1, 2 data shards and 1 parity shard, so that I can lose one store without issue
  3. I now delete foo from the directory and add baz, now I have backed up foo and bar, but locally only have bar and baz
  4. I lose one of my three stores, no problem, parity exists
  5. I add a new store, which has no data
  6. I backup foobar, which includes bar and baz, but not foo anymore. Any lost data of bar will be reproduced on the new store, and baz (First time being backed up) will be uploaded.
  7. I lose another store (I'm terribly unlucky/clumsy)

At this point, unless I'm mistaken, I've now lost 2/3 stores for my old original backup (Of foo and bar), and 1/3 stores for my new backup (bar and baz), this surely means that now I can recover bar and baz, but foo is completely lost. Am I mistaken?

@Roman2K:

That's exactly right. You have lost 2/3 stores of the original backup and foo is now unrecoverable.

@Technifocal:

My question was is there anyway, after losing the first store, to retroactively go back and repair and reupload old backups without requiring the files locally? I understand this would use a lot of IO (Either net I/O in the case of a cloud provider (Downloading, repairing, uploading), or disk I/O in the case of local disks (Reading, processing, writing)) but I feel like it'd increase the longevity of backups, unless I am missing something.

@Roman2K:

There currently isn't a way to do the recovery retroactively without re-running the backup with the original data lost. But all the components are there, they just need to be assembled into a new proc, I propose "parscrub". I totally agree this is needed to increase longevity of old snapshots. You would need to run a new scrub script for the index file of each snapshot of which you want to ensure the longevity, on a regular basis.

@Technifocal:

I had an idea while I was sitting on the tube, would running a restore piped directly back into a backup solve this issue?

Something like:

for i in *.index; do
   scat -stats "unindex unparity unmake unstuff uncompress unencrypt unmagic unupload undone" < ${i} | scat -stats "index parity make stuff compress encrypt magic upload done";
done

That would go back and download all the old backups and reupload them with the new parity shared, and shouldn't(?) upload anything that wasn't corrupted because it was already all uploaded, is that correct?

If so, very elegant, just went a bit over my head.

@Roman2K:

That would work too. Though a bit convoluted: it's a shame to have to join and split right back after 🙈 But yes, the end result would be the same as a parscrub in a single scat run, unless I miss something.