Roman2K / scat

Decentralized, trustless backup tool
https://github.com/Roman2K/scat/issues/1
MIT License
92 stars 9 forks source link

Treat decryption failure as a read error? #13

Open Roman2K opened 7 years ago

Roman2K commented 7 years ago

Question by @lavalamp originally posted in the defunct GitLab repo (old issue):

I noticed while reading https://github.com/klauspost/reedsolomon:

The final (and important) part is to be able to reconstruct missing shards. For this to work, you need to know which parts of your data is missing. The encoder does not know which parts are invalid, so if data corruption is a likely scenario, you need to implement a hash check for each shard. If a byte has changed in your set, and you don't know which it is, there is no way to reconstruct the data set.

So I thought I'd give it a try and deliberately changed a bit in a test backup. Unfortunately, this resulted in the restore not working (removing the file entirely allowed the restore to succeed, as expected). It seems like the problem is that if the decrypt step fails, the entire restore is aborted. I guess ideally, the decryption failure ought to be treated the same as if the remote shard was missing. Maybe there's a way to fix my restore script?

    uindex | backlog 8 {
      backlog 4 multireader(
        a=cp(/path/to/a)
        b=cp(/path/to/b)
        c=cp(/path/to/c)
      ) |
      cmd gpg --args --to --decode |
      uchecksum |
      group 3 |
      uparity 2 1 |
      cmd unxz
    } |
    join -

(sorry I keep filing issues, I think the concept is pretty cool and I'm attempting to use scat to backup my own files...)

Roman2K commented 7 years ago

My response:

Ah, yes... So, in fact, uparity supports both missing files and integrity check failures.

However in this case, with gpg before uchecksum, you don't get an integrity check failure because gpg fails first due to invalid crypted data.

Indeed, we have to checksum before gpg (and in the restore script, uchecksum after gpg), because gpg generates different crypted data every time, for identical input data.

The consequence is that changing data on remotes results in failed decompression rather than failed integrity check (what I wanted initially, before realizing gpg generates different data every time).

So that's a very good point. I don't know what to do, if considering any error as potentially recoverable by uparity, not just integrity check or missing data 🤔 I'll leave the ticket open until deciding.