gilbertchen / duplicacy

A new generation cloud backup tool
https://duplicacy.com
Other
5.13k stars 335 forks source link

Security weakness: PBKDF2 invocation uses static salt reducing effective security #137

Open shonjir opened 7 years ago

shonjir commented 7 years ago

I was looking into Duplicacy as an alternative to Crashplan and wanted to know more about its data security features, and noticed the following issue.

The PBKDF2 key generation function uses a hard-coded static salt and hard-coded number of rounds. This dramatically reduces the keyspace of the generated encryption keys. The generated key is also used directly as data-encryption-key meaning that neither the password or key derivation parameters cannot be changed once the dataset is instantiated.

https://github.com/gilbertchen/duplicacy/blob/7f04a791119759e903946375a1463ca3acab3110/src/duplicacy_config.go#L25

https://github.com/gilbertchen/duplicacy/blob/22ddc04698077fc297ae7408991405fc4aac0fad/src/duplicacy_utils.go#L118

PBKDF2 is an HMAC based key derivation function. It derives its strength from the number of rounds and through the use of a random salt to prevent rainbow table style attacks.

It is essential to use a randomly derived salt of at least 8 bytes for this function to establish sufficient levels of security for the generated keys.

Best practice for this function is to securely generate a new random salt along with a new password (e.g., when the dataset is created). The key derivation function, salt, number of rounds, and the hash function identifier should be stored with the dataset so that the generated key may be reproduced regardless of the current key generation method. Making the number of rounds configurable (above a minimum default security level) would further increase the security of the implementation.

Because this key is used directly as a data encryption key the use of hard-coded salt and hard-coded rounds means that these values cannot be changed in the future without invalidating existing backup-sets.

The key derivation function should be used instead as a key-encryption-key to protect a randomly generated data-encryption-key generated when the dataset is created. This would also the user to change the dataset passwords as well as allow later implementation of alternative protection methods for the data-encryption-key such as PKCS, and allow leveraging cryptographic key stores, etc.

gilbertchen commented 7 years ago

I agree this is a security flaw. A static salt makes it possible for the potential hackers to build a universal rainbow table to attack a large number of duplicacy users at once if they can, for instance, gain access to a cloud storage.

So the V2 of the master key encryption scheme should be like this:

Do you see any issue with these? Any suggestions?

gilbertchen commented 7 years ago

For anyone reading this thread who is concerned, as long as you use a long (say, >12) and not easily guessable password, you should be fine. The V2 scheme doesn't give extra protection if you have a good password, unless the number of rounds is increased.

robbat2 commented 7 years ago

@gilbertchen There is still added security because a static salt increases vulnerability to dictionary attacks.

shonjir commented 7 years ago

This will work, but be aware that because you're still using it to derive the actual data encryption key the values cannot be changed after the backup set is created - a compromise or flaw in the scheme means that the backup set must be recreated.

This is why I had recommended generating a sufficiently random data encryption key and using the passphrase as a key to encrypt the data key. This would allow upgrading the key protection at a later time through a decrypt/encrypt process without requiring re-creation of a backup set.

robbat2 commented 7 years ago

@shonjir other apps take the approach of a random data key (DEK) for every chunk, and storing it (encrypted) in parallel to the chunk. Rotating passphrase only means updating that separate data, not rewriting the chunk.

shonjir commented 7 years ago

That would work also assuming that the re-encryption of the chunk keys is an atomic operation, otherwise key rotation could result in repository corruption if the operation were interrupted for some reason.

robbat2 commented 7 years ago

@shonjir Amanda uses the approach of having the DEK as a header on the chunks, but that means modifying the entire chunk, which is very expensive (or even impossible in some storage types).

If it's a local filesystem, you can write to a new file and atomically rename. However, for other storage types, esp. those that are only eventually consistent, that's really problematic, even with atomic move, because you can't be sure which version of the metadata going to show up.

Let's take the encryption of chunks to a separate issue.

gilbertchen commented 7 years ago

@shonjir The key generated by pbkdf2 is used only to encrypt/decrypt the config file. The 4 keys used to encrypt/decrypt chunks and snapshots are randomly generated and stored in config. Therefore, it is possible to change the pbkdf2 passphrase without recreating all the backups (which is implemented in the password command.

gilbertchen commented 6 years ago

PR #244 submitted to fix this issue. Please review the changes if you can.

uup4 commented 6 years ago

Just discovered this SW and started reading more about it here. So far it looks like something I would use. I also would want to use the encryption feature if I'm storing my backups in the cloud.

Not sure if it's too late to piggyback on the implementation for this issue with this idea. The idea is from VeraCrypt, where I think the number of kdf iterations can be a user input, known only to the user. I think this would add another line of defense for the files in the cloud.

The PR #244 states the new format for the config file is "duplicacy\001" + salt + #iterations + encrypted content. At a very, very high level, I think this idea could work as follows: a new switch could be created when doing the init with encryption to specify not to store the number of iterations in the config file and the number of iterations used for the backup and the password would be required input from the user anytime they want to do a backup when the option to not store iterations in the config file is used.

uup4 commented 6 years ago

@gilbertchen, I also wanted to mention this, maybe you already have it mentioned somewhere that I didn't see. The CLI version is apparently free for personal use from what I read, but maybe you ought to have a link for people that plan only to use the freeware version to still make a donation, if they so chose.

gilbertchen commented 6 years ago

That would be a nice feature to have. However, I'll make it a separate feature request instead of including it in PR #244.

As for donations, most revenues are expected to come from the sales of the GUI version, so I feel it is unnecessary to accept donations from the CLI version, or even a bit unethical since Duplicacy is not a traditional open source project. Donations of code are more needed than donations of money -- many contributors have already kindly spent their time fixing bugs and implementing new features, which I really appreciate.

uup4 commented 6 years ago

Regarding donations, fair enough.....Thank you.

Regarding not putting the user's true iterations value in the config file, I agree, it makes better sense as an independent feature request. Until this becomes a proper feature, after looking at the PR #244 code and if I build my own duplicacy CLI executable, I think, if I didn't miss anything, I can get what I want by making only one small change in one file by doing the following (please confirm if I'm not mistaken):

In file duplicacy/src/duplicacy_utils.go simply change iterations in the pbkdf2.Key() function call of the below snippet to a hardcoded integer value that I want to use. The config file will be created in the storage as normal with a value for iterations, but it won't be the true value I'm using since I have a hardcoded value anytime pbkdf2.Key() gets called. Also the value for iterations in logs would not be accurate either, but as long as everything else still works, that's okay for now.

// GenerateKeyFromPassword generates a key from the password. func GenerateKeyFromPassword(password string, salt []byte, iterations int) []byte { return pbkdf2.Key([]byte(password), salt, iterations, 32, sha256.New) }

gilbertchen commented 6 years ago

@uup4 right, that should work.

sergeevabc commented 6 years ago

@gilbertchen, have you thought of moving from pbkdf2 to argon2id?

gene1wood commented 5 years ago

@gilbertchen I think your #244 PR fixes this issue. It's been a year that this issue's remained open and I don't think @shonjir is going to come back and review the change. Probably best to close this issue out as I assumed this weakness still existed (seeing this open issue) until I noticed in the code the iterations were configurable.

sergeevabc commented 5 years ago

@gene1wood, but what about moving from pbkdf2 to argon2id?

gene1wood commented 5 years ago

@sergeevabc I'd suggest opening an issue on that specific topic as I wouldn't be surprised if it gets lost in this issue (since this issues title is Security weakness: PBKDF2 invocation uses static salt reducing effective security which talks about the salt, not about key derivation)