Open AndrolGenhald opened 1 year ago
Simply reading a backup with 'get' will validate all hmacs - a better system should probably be added in the future.
Also, your intuition regarding keys is correct - the garbage data can corrupt future backups, but not past ones if you reuse keys.
That'll work, but it does make it more complicated to validate the entire repository since bupstash get
requires the query to match a single item. It probably shouldn't be too hard to write a script that runs bupstash list
and then runs bupstash get
with each id though.
I realized the other day that using bupstash get
on every id returned from bupstash list
is actually going to verify most of the data many times. If I have a several TiB backup that's updated daily, doing a bupstash get
on every id is going to verify several TiB for every single id.
I could just use the most recent id, but if I verify weekly or monthly I risk not verifying files that were added and then removed since the last verification. I'd like to have a way to verify all of the content in the repository without having to deal with duplicate data.
I might try to see if I can put together a PR if I have time in the next month or two, this seems like a good learning opportunity since it should work similar to existing functionality, and it's not too difficult conceptually.
I feel like It might be slightly trickier than it seems to do efficiently, there are two things you need to verify.
bupstash list-contents
) match the true file contents, this is also hard without sending the data to the client.One idea is to use 'bupstash sync' to efficiently transfer new objects to the client for verification.
Another of my ideas is for the server to track if each id has been verified by the master key by having the client send a signed id + verification timestamp to the server to save. In this case you would still need to verify every item, but the server can remember which have been verified for you.
Yes for "bupstash scrub"
I saw that the docs mention "A client can retrieve and verify a datastream by checking hmacs", but I wasn't able to find a way to do this, and grepping the source for "verify" didn't turn up anything useful. Does this already exist and I just missed it, or is it planned to be implemented in the future?
I would like to have multiple servers back up to the same repository and deduplicate data across servers (which I believe means I need to use the same sub-key for each server rather than generate separate sub-keys). Since this means a malicious server could send garbage data and it could corrupt the backups for the other servers, I would like to periodically verify the backup.