P-H-C / phc-winner-argon2

The password hash Argon2, winner of PHC
Other
4.79k stars 407 forks source link

Password encryption #222

Open jedisct1 opened 7 years ago

jedisct1 commented 7 years ago

Encrypting hashed passwords is a good practice. The idea behind this being that encryption keys and hashed passwords are typically stored in different locations (application vs database), so that compromised databases remain useless against offline attacks.

Instead of delegating the encryption task to application developers, it might be a good idea to formalize encryption in the Argon2 specification (and leverage BLAKE2 that we already have), if only for interoperability purposes.

josephlr commented 7 years ago

@jedisct1 There is a secret parameter in the Argon2 hash that should be used for the purposes you describe. See the docs about the secret parameter.

jedisct1 commented 7 years ago

Unfortunately, and unlike encryption, this doesn't allow key updates.

ryaneliseislalom commented 7 years ago

If this approach is taken, the next logical question is "how do we store the private keys used by the encryption?". Left without a discrete scope, Argon2 could progress into a re-implementation of something like (Hashicorp Vault)[https://www.vaultproject.io/]. I think the goal is to let Argon2 "do one thing, and do it well", as the saying goes. Argon2 is an algorithm, not a secrets management solution.

jedisct1 commented 7 years ago

How keys are stored is beyond the scope of an encryption scheme.

This could be anywhere, even hard-coded in the application, but no matter where these keys are, the ability to replace them (so, reencrypt the hashed passwords) if they ever get leaked is what really matters.

ryaneliseislalom commented 7 years ago

Key rotation is an excellent example of where a secrets management solution would fit, like in this example: https://www.vaultproject.io/guides/rekeying-and-rotating.html I think it would be extremely useful if Argon2 was so efficient and easy to integrate that it got adopted by more secrets management solutions.

solardiz commented 7 years ago

FWIW, I am working on format-preserving encryption for typical password hash strings - initially I meant this as yescrypt's builtin hash encryption feature, but it just happens easily extensible to cover a full range of *crypt's at once. Their salts, too - so that the encrypted salts+hashes can't be used to mount timing attacks w/o having the key. Coming from yescrypt, it will be variable block size (up to 512 bits, separately for salt and hash) Luby-Rackoff building upon SHA-256 (only), since that's what we readily have in there. I'll probably make it a separate source file for easy reuse separately from yescrypt. Sounds like a good solution? Or is it still preferable to have an Argon2-specific builtin feature (like the same thing using BLAKE2b and with less generality)?

jedisct1 commented 7 years ago

A generic system that can support any hash string would be awesome. But having it depend on SHA-256... not so much. Does your system also encrypt the function identifier? If not, would it be possible to define it so that password hashing schemes can use the same construction, but specify the cipher?

solardiz commented 7 years ago

Does your system also encrypt the function identifier?

No, at least not as currently planned. Only the salt and hash would be encrypted, but the hash type id and parameters would stay plaintext. They vary across hash types (and even within hash types) too much to have a generic system encrypt them in a format-preserving manner.

would it be possible to define it so that password hashing schemes can use the same construction, but specify the cipher?

I don't see how this would easily fit typical C APIs, where we separately have a context struct and 3 functions Init/Update/Final. Suggestions welcome. In other languages or with other APIs where the entire hash type's data type and implementation are more closely packed together and referred to by one identifier (e.g., a class), probably yes.

In C, I think it's more realistic to have a separate compile-time version with BLAKE2b for use along with Argon2, if that's the preference. Since this is reversible encryption, even if we change our mind later it won't require us sticking to the same choice forever - we could re-encrypt to something else at a later time.

Oh, you said "specify the cipher", not the hash. So you mean to use my string parsing and base64 de/encoding, but replace the entire block cipher? This is probably easier (can define any new API of our own for it), but it makes little sense to me, because having a variable block size cipher is essential to this whole construction, and where else would you commonly have a suitable cipher? Variable block size is needed due to many hash types, including Argon2, accepting variable size salts. (Also bcrypt is weird with its 184-bit hashes and different encoding.) If we exclude the salts and only encrypt the hashes, this would be easier (insist on exactly 256-bit hashes, then) - but then there are precomputation and timing attacks.

My1 commented 6 years ago

well while encryption would be intresting, there is one problem, depending on how the encryption works. if one would encrypt the whole password hash string or basically anything which is some sort of structure (for example salt$password) then someone could try the encryption key seperately from the password, which greatly lowers security compared to adding the secret directly in the hash.

let's simplify this a bit.

let's say we have 2 boxes with 4 different marbles each, one being for the secret value, another for the password.

encrypting the thing is basically taking out the "secret" marble first, being told whether or not it's the correct one and if correct, procceed to taking out the "password" marble.

the total amount of tries this takes is (4 -1) (the amount of wrong secrets) + 4 (tries on the correct secret) = 7. in case 4/4 (the last try on both marbles) we get the following outcomes:

1-> wrong, try again
we know that the secret is wrong -> trying the passwords is useless
2-> wrong, try again
we know that the secret is wrong -> trying the passwords is useless
3-> wrong, try again
we know that the secret is wrong -> trying the passwords is useless
4/1 -> wrong, try again
4/2 -> wrong, try again
4/3 -> wrong, try again
4/4 -> correct

now for incoperating the key into the hash. this time we have to take a marble out of each box first and THEN we are told whether it's correct or not.

each "secret" marble can coupled with 4 "password" marbles, and since we dont know whether the "secret" marble is correct, we have to pull out a "password" marble too.

this creates 4 (amount of secret marbles) * 4 (amount of password marbles) = 16 tries max.

outcomes:

1/1
1/2
1/3
1/4
all passwords are wrong for secret -> try with next secret
2/1
2/2
2/3
2/4
all passwords are wrong for secret -> try with next secret
3/1
3/2
3/3
3/4
all passwords are wrong for secret -> try with next secret
4/1
4/2
4/3
4/4 -> correct.

continiung this calculation to our thing at hand. for secret we are just taking 2^128 (128 bits) because that's usual key length for many crypto operations. for the password we are taking 94^8 (about 52,4 bits) because that's 8 totally random print-ascii chars, which is already a pretty high assumption for most passwords, and it's a bit higher than 4 words on diceware (51,6 bits or 7776^4).

so let's get this to real numbers: 94^8 equals 6.095.689.385.410.816 (6,1x10^15) 2^128 is 340.282.366.920.938.463.463.374.607.431.768.211.456 (3,4x10^38)

first approach (we know whether the secret is correct immediately)

(2^128-1) + 94^8 = 340.282.366.920.938.463.463.380.703.121.153.622.271 (3,4x10^38)

this is barely any change from the amount of secrets, lol. in comparison to just using the password it's an about 5,6x10^22 times the amount of maxmum tries.

for the second approach (your only know the result after trying both you get:

2^128 * 94^8 = 2.074.255.612.082.433.166.820.895.186.606.541.301.079.463.884.117.508.096

that's 2,1x10^54 or 180 bits

the amount of max tries is obviously about 3,4x10^38 times as much as just the password.

also one important thing. by coupling the secret with the password hash, finding the secret becomes as expensive as argon2 itself, obviously, because a run of AES is a pretty fast thing, but running argon for like 0,2 seconds on like 256MB of RAM (possibly more, depending on the defender's processing resources) is gonna be a LOT more cost to an attacker.

some comparison. The bitcoin network operates (from the latest stats I could find) at about 16,6 Million Terahashes per second, tera being 10^9 and 16,6 million being 16,6^x10^6 we have 16,x10^15 hashes per second total.

I know that hashes and crypto are different things but I cant think of any better statistics I could use for comparison.

if an attacker gets a huge af botnet (or has way too much money to burn) and gets that amount of decryption attempts per second that still would be around 650 trillion years at maximum just for the secret, but as time goes on it's gonna get faster.

sorry for the wall of text.

solardiz commented 6 years ago

Yes, encryption does have this (minor?) drawback.

I'd expect the difference in attack cost to be smaller than the "wall of text" seems to estimate. In many cases, the attacker can set a known password on their own account, so they would not have to also probe passwords when probing keys. In other cases, the passwords and keys would need to be probed over a "gradient" with estimated probability for both of them being correct decreasing between tries - not with a mere "nested loop", unless that happens to be optimal (it may be if all keys are equally likely, then the outer loop would be over passwords - from most likely to least likely).

And yes, it matters "how the encryption works". In particular, is decrypted data indistinguishable from a valid salt and a valid hash? For cryptographically random salts and Argon2, and for my current (unreleased) code, they should be (although for now I'm using this with yescrypt hashes, and haven't actually tested it with Argon2 yet). For e.g. not-so-random salts or for bcrypt, there might be a way to reject some keys early (right after attempted decryption) because bcrypt uses a different order of characters in its base64 alphabet than most other crypt(3) flavors do. So without base64 alphabet translation, for bcrypt there would be a giveaway that encryption is used (otherwise-always-zero bits set) and that decryption key was wrong (ditto). I've been thinking about this the other day, in context of whether it's in fact worth it to support not-whole-byte sizes of plaintext data in that universal crypt(3)'ish hash encryption layer.

And indeed we should recommend that a strong key be used, so that this does not matter. We may also recommend that the actual key be derived from whatever is specified in a config file, etc. using the slow hash/KDF itself (this is how I intend the relevant yescrypt APIs to be used), perhaps at higher cost settings than are used for individual hash calculations (this key derivation would then be done on microservice startup or such - in fact, that's precisely how I implemented that in one of those).

My1 commented 6 years ago

"In many cases, the attacker can set a known password on their own account, so they would not have to also probe passwords when probing keys"

well in that cases using the secret as part of the hash algorithm gets even more useful, at the very least if the encryption has some way of verification it worked, and since everyone always says one should use Auth'ed encryption when they use it, there sure are ways to check that. because in that case, we have a blazingly fast encryption that most things on this planet can do really quick in hardware compared to a hashing algo that's designed to be as slow as the defender wants it to be.

so you cannot go around and try *-illions of keys per splitsecond anymore

"In particular, is decrypted data indistinguishable from a valid salt and a valid hash?"

exactly, that's what I meant with "anything that has a structure" because you can check for the structure. for example you know a complete bcrypt hash string starts with $2, but as said above if for example using auth'ed encryption you wouldnt even need structure, as the auth part is enough., and even without. as soon as we need padding, the padding could be checked (sonehow the defender also needs to know what is part of the padding and what isnt).

if encryption is used one really should use the bytestring (important, as e.g. using the base64 would mean a structure to exclude a lot of the results that fail) of the hash, the hash should have a length that doesnt need padding and try to encrypt that and do the same for the salt, otherwise the attacker could take the given salt and his known password hash it and then try decrypting until you get the hash you wanted. the encrypted text then obviously can be stored as base64 or whatever, but the operations need to happen on the raw bytestrings.

regarding deriving the key from stuff stored somewhere using a strong hash and a "microservice", sounds intresting, this would lead to quite a barrier for some things as for example webhosts usually only run PHP and PHP is pretty stateless except for maybe some session stuff you can pull and other data, btu I didnt hear of the ability to let a script run when the server starts, stay in background and pass that secret whenever needed. in my case, I let PHP generate a random thing as part of a setup file and include that in the config.

solardiz commented 6 years ago

My1, we're in agreement on this.

It's a case where non-authenticated encryption of just the underlying binary data may be preferable over (maybe authenticated) encryption of the encoded strings. This is what I intend to do.

The padding, if present, should stay the same in original and encrypted hashes - e.g., in the 43 chars representing a 256-bit hash, the extra 2 bits should be always zero in both original and encrypted form of the hash. To me, that's a reason to support only whole byte data sizes in the cipher and in the decoder/encoder. This way, the 43 chars would decode to 32 bytes, be en/decrypted, and then encoded to 43 chars again with the en/decryption's use/success undetectable until the password is known. Not decoded to 258 bits, en/decrypted, and then encoded to 43 chars again where the normally always-zero bits would no longer necessarily be zero, and where an attacker's trial decryptions would allow for early reject of 3/4 of candidate keys. This is rather unfortunate as it limits transparent support of certain other crypt(3) hashes (mostly for their salts, but also for bcrypt's hashes without base64 alphabet translation), but I guess we have to accept this limitation for the reasons you give.

OTOH, if someone encrypts the encoded strings anyway, then there's no further security drawback from using authenticated encryption, and in fact it'd prevent some other attacks (such as on parameter parsers through injected malicious "encrypted hashes"). With proper keying, that should be fine. It's just not something I intend to do.

As to "secret as part of the hash algorithm", this also can vary: if it's e.g. HMAC applied as the last step, then the slow hashing can be done just once for the known password (or just once per password, if not known) and then only the fast HMAC repeated per key test. Moreover, if the secret is applied after some computation on the password is already done, precomputation attacks and cache-timing attacks become more relevant. (Salt encryption mitigates those.)

I also agree that using a KDF to derive the actual encryption key or the secret input is harder and less likely to be done in a typical PHP app. Generating a sufficiently large cryptographically random value at app install time and storing it may be the way to go, as long as the user (this install's admin) is informed that they need to back up this value or file and not only the database.

My1 commented 6 years ago

well true as you said the b64 padding wont matter for bindata en/decryptions. also if one is hilarious enough to crypt the whole b64 string then the attacker could probably kick far more keys out than just 3/4. and not just on the basis of the last 2 bytes, but for the simple reason that if even one character of the decrypted thing isnt part of the b64 set (or rather the modified b64 in crypt, for that matter, because normal b64 also comes with extra padding in form of a few = characters at the end, making it even worse if it had been used.)

and yeah if someone already does that they can go straight to AEAD and thereby have a valid way to see that a) the hash isnt broken, 2) it wasnt tampered with.

thanks for the info about HMAC, I kinda would probably have brought it up as an example but apparently it wouldnt be a good one. normally I would expect a key, pepper, secret whatever you may call it be implemented in a similar manner as a salt before hashing functions knew how to deal with it, except for the obvious fact it's kept as far away from the database as possible.

(basically meaning you combine everything first and throw everything into the hash. I for example use the pepper, plus a self-made user-secret (basically a second salt) throw that into SHA384(raw) then base91 because bcrypt iirc didnt like zero bytes too much and has a length limit of iirc fifty-something characters and then the whole thing into bcrypt (which also adds its own salt), and later argon2 will take it, instead of bcrypt)

by combining first and then hashing you have obviously the whole hash spreading over everything involving the secret and any other fun you might add.

regarding your last paragraph, depending on how many users there are and how critical the thing is I would be actually against backing this thing up and instead axing the password hashes and forcing the users to reset. Basically what I call the "smart card approach", ALL crypto relevant data will be generated on the machine and WILL NOT leave the machine. if the machine gets problems, same as with a smartcard, make new crypto data, put everything together again and finish.

that's also a reason why I like the key inside the hash. and dont wanna re-crypt, because I can just reset if needed. and if for example a part of the secret runs deterministic instead of just static and I wanna change that part it would get annoying having both active somehow, so resetting is easier in that case.

if encrypting important data you dont wanna lose that's definitely not a good one but I think for authentication where the concept of admins exist and stuff can be reset or reconnected with relative ease that isnt a problem. obviously on adminless systems like a blockchain that approach wont work either but this obviously isnt the case here.

depending on what the website is about, the slight annoyance of having a PW reset is far better than the problem of the key leaking because someone screwed up the backup security (it isnt like this has never happened before.) and that's especially the case for encryption. because for example if one tries to hash a whole bunch of possible passwords with a metric f-ton of different salts ([e.g. crap RNG on the target] otherwise dont ask me how) doing the same for all the different possible secrets is a few scales higher, so if at all they would probably need to recalculate the hash table again for the secret. in case of crypto they just decrypt and run the hashes over the table, not that that would be possible any time soon but I have no idea what quantum and stuff may bring, and better safe than sorry.