borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.27k stars 747 forks source link

Public key encryption support #672

Closed KenMacD closed 7 years ago

KenMacD commented 8 years ago

Storing the key used to encrypt backups on the server used to create the backups is not ideal. It's impossible to tell when it's been stolen, and stealing the key once would provide access to all past and future backup data.

Instead it would be nice if a new symmetric key was somehow for each archive, and then encrypted using the public key. That way the private key could be kept safely offline until a restore was required.

Duplicity does something similar in using gpg to protect the files.

jungle-boogie commented 8 years ago

How would automated backups work?

KenMacD commented 8 years ago

I would hope the same as they currently do, just without the requirement to have the private key material on the server. The public key can exist on all the servers, but with it the data could not be decrypted.

I'm not currently sure what data needs decrypting during future backups, so I'm not sure if this is possible, but it seems like as long as the metadata is available the actual data may not need to be.

ThomasWaldmann commented 8 years ago

The problem here is that the code (as is) uses the repository mostly as a key/value store. The repo manifest (has the list of all archives) is stored at key=0. When a new archive is created, the manifest is read, the new archive entry is added and the manifest is written back. That does not work if the encryption is not reversible for the client.

KenMacD commented 8 years ago

Would there be any way to keep the manifest/metadata using a symmetric key, but encrypt the data with a public key?

ThomasWaldmann commented 8 years ago

of course one could always specialcase the manifest.

but that was just an example, there are also other chunks that need to get read (and not just written), e.g. for delete / prune. this might be a feature rather than a problem though because if you do not fully trust the client, you maybe do not want that prune / delete works from there.

compaction is also a place where stuff gets read and re-written, it has to be checked - maybe no decryption is needed in this case.

KenMacD commented 8 years ago

Thanks for the info. So if I'm understanding this correctly it may not be impossible, but is certainly not easy or likely to happen any time soon. Is that about right? Is so feel to close this wontfix.

RonnyPfannschmidt commented 8 years ago

How about having a "extend" Operation to add to the manifest,

then knowing the content is no longer needed to add New archives

thomsh commented 8 years ago

KenMacD is right, Encryption on a public key must be the law, you can rely on GnuPG. I cant imagine a backup job with a static passphrase on the server. :( Other feature are great!

enkore commented 8 years ago

Crypto roadmap -> #1044 Currently PK is not on it, but technically possible by the draft DEK spec.

ghost commented 8 years ago

We really need this feature! Passwords for critical data is not serious...

enkore commented 8 years ago

Stapling #120

rugk commented 8 years ago

As stated in https://github.com/borgbackup/borg/issues/1786 you may also use gpg for signing these backups if you implemented it.

In any case I think that a kind of hybrid encryption would be nice as it combines the advantages of both encryption methods.

This would be good as:

In any case I'd argue:

¹ Note that current symmetric encryption is considered secure against post-quantum attacks.

enkore commented 8 years ago

Signing archives and using asymmetric key derivation to encrypt archives are imho different topics.

Signing is relatively simple to bolt-on externally (or internally), by signing the tip of the hash tree, ie. the archive ID. This is pretty much what git and other software does (do git cat-file -p <signed commit>), but rather expensive to actually verify (not just verifying that it's a correct signature for the archive ID).

rugk commented 8 years ago

Okay, then split signing from encryption, but I'd still find it awesome if we get a hybrid encryption method explained above.

ghost commented 8 years ago

Better to use different keys for signing and encryption for some reasons. http://security.stackexchange.com/questions/1806/why-should-one-not-use-the-same-asymmetric-key-for-encryption-as-they-do-for-sig

I agree, better to split this tasks (but gpg can be used for both of them)...

About gpg encryption: Just try to take this solution and adapt it to borg backup, so something like PASSPHRASE="passphrase_for_GPG" borg --encrypt-key 4F8A7D0C

https://www.digitalocean.com/community/tutorials/how-to-use-duplicity-with-gpg-to-securely-automate-backups-on-ubuntu here you can read good article about Duplicity with GPG encryption, let's just try repeat this solution, it's good. gpg is good standard for such type of tasks, we can just adapt it and do not reinvent the wheel..

jody-frankowski commented 8 years ago

@lorddaedra A gpg passphrase would only be needed when you would need to decrypt (i.e. use the private key) some kind of metadata. Otherwise gpg can encrypt data fine with only the public key.

IMHO a good solution would be to keep some kind of metadata file locally that borg can use to know what it should backup or not, encrypt the actual backup data and the metadata file asymmetrically, and then send all that to the remote. That way the whole process could be automatic and only require the private key when restoring.

rugk commented 8 years ago

@lorddaedra Actually the way you propose gpg usage is the one I don't want to use, because there backups will be encrypted asymmetrical. So I recommend not to use (or require) a passphrase for a gpg key and use the unlocked key pair for encryption. For some reasons I outlined above, a symmetrical encryption is better and easier to implement here. The password you enter should still be used for the symmetric secret you have.

I only want to encrypt the symmetrical secret asymmetrically, so that (faster and from a post-quantum perspective more secure) symmetric encryption is used.

enkore commented 8 years ago

Assuming use of gpg, both schemes are hybrid encryption, but with your idea the session key would be controlled by Borg, allowing alternate access - like you said, a backup password or something similar, whereas just using gpg would mean that the session key is buried somewhere in the ubercomplex PGP format.

(FTR, if public key crypto is done in Borg, it will always be a hybrid scheme, because not doing public key cryptography with hybrid encryption practically always means that you've built an insecure system (aka "craptography").

ghost commented 8 years ago

some thoughts...

In my opinion, we can't choose one best solution, which will work great for all cases, so better to try make all components customizable. [if somewhere we use password I would like to choose, use Bcrypt or Argon2 or something else ideally / if we use HMAC, I would like to choose, SHA-256 or, SHA-512 / if we use AES, I would like to choose AES 128 or AES 256 etc..]

It's also good idea to look at another projects and check how they solve same problems. For example, I suggest to look at https://www.tarsnap.com/crypto.html Very similar project to borg...

(And article with some information about crypto from author of this project http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html, may be little outdated. read last comments too about GSM vs CTR + HMAC..), also interesting page http://security.stackexchange.com/questions/63132/when-to-use-hmac-alongside-aes

We can use https://cryptography.io/en/latest/ to encrypt with AES 256 GCM (which internally uses CTR afaik) or AES 256 CTR + HMAC ...

enkore commented 8 years ago

(Please try to keep discussion approx on topic - borg has an existing framework for symmetric/secret key encryption, and there are other tickets, linked via #1044, for discussion about that - thanks :)

There's nothing wrong in what Colin Percival says, in fact these are all very good recommendation for the time, and mostly even today (the very specific RSA-based algorithms he recommends are as far as we know secure, but can be tricky to implement correctly). Today I would not recommend anything RSA for new crypto systems, there are better alternatives that have none of the pitfalls and are designed to enable secure implementations by default. (Well and SHA-3 turned out to be infeasible for general purpose applications, and downright bad for password derivation -- but that was impossible to predict in 2009.)

Whether there will be actually any public key crypto code to write in Borg remains to be seen however; if gpg would handle it, then we wouldn't have anything to do with that.

intelfx commented 7 years ago

@enkore

Well and SHA-3 turned out to be infeasible for general purpose applications, and downright bad for password derivation

Could you please elaborate? Never heard of this.

rugk commented 7 years ago

downright bad for password derivation

Please not. I think SHA-3 is not considered to be used for passwords… Currently there is also no need to switch. SHA-256 turned out to be quite strong and everyone relies on it, so no concern yet.

enkore commented 7 years ago

SHA-3 is optimized for hardware performance (even from a numbers perspective -- it's a lot (on some archs drastically) slower than SHA-2 in software), which is the exact opposite of what we want for Borg* and "an even stronger opposite" for password derivation.

* unless hardware-acceleration becomes widespread, which took Intel like 12 years for AES.

enkore commented 7 years ago

This seems unlikely to happen any time soon. There are no plans to implement this and various hard, technical issues preventing implementation at this time (see above). Therefore:

wontfix: No time table or plans to implement. Does not mean that something is rejected for eternity, since arguments pro/contra something can change with time and context.

(from https://github.com/borgbackup/borg/wiki/Project-management-FAQ-RAQ)

jcgruenhage commented 7 years ago

Sad to hear this, I'll look if I can come up with some way how to implement this though, since I'd love for some host to be able to back up it's stuff without being able to read backups (of possibly other hosts).

capi commented 7 years ago

@jcgruenhage Use separate repositories with different SSH keys for the various hosts.

If multiple hosts backup to the same repository, there needs to be a possibility to read parts of the content of the archive and therefore other hosts data for de-duplication. Checking hashes alone is not enough for de-duplication, see the SHA1-collission with the two same-sized PDF files and what it caused on e.g. Subversion.

jcgruenhage commented 7 years ago

@capi But that makes it so that I can't use deduplication across hosts either. About the SHA1 collision, it is still highly improbable for that to happen "on accident", and to avoid that, one could "just" use a more recent hashing algorithm (BLAKE2b-512 for example, if you are that worried about collisions), regarding that SHA1 is about 22 years old.

I am rather certain borg relies on the hashes right now too, since without that, borg would need to download most of the content from the remote host for each new archive, and large archives with little changes don't take as long as they would need to if that was the case.