Consider adding algorithm primers

cbzehner commented 4 years ago

I'm writing a server boilerplate that I hope to eventually use in production, right now it's just for fun. One of the things I'd like to do is store user accounts (and password digests!) in my database.

I noticed y'all have three algorithms listed but no guidance about how to choose one for the naive 🙋‍♂️ . Would it be reasonable to provide a primer about each algorithm and what settings it might be appropriate it?

cbzehner commented 4 years ago

I was redirected to this repo when reading https://github.com/RustCrypto/hashes, so I appreciate y'all saving me from the mistake of using Sha3 for creating password digests!

newpavlov commented 4 years ago

You mean primers in the root README, correct? I don't think it will be reasonable to duplicate examples from crates documentation, but I guess a note which would recommend to use argon2 or scrypt could be a good addition.

After introduction of a trait for password hashing functions we also could add examples similar to RustCrypto/hashes.

cbzehner commented 4 years ago

That's exactly what I mean. The crates have great documentation on how to use the code, but assume knowledge of the algorithms in question.

It would be helpful to provide some guidance about:

potential tradeoffs between these algorithms
edge cases to look out for
what situations they should be applied

polarathene commented 3 years ago

~~I'll try be terse on common answers you'll find around online, it should be enough to get a fair idea.~~ Turned out to be much longer response than I thought :(

If you have any more questions/concerns I'll try respond to those too. Note I'm not a security professional or cryptographer, but I have done my own deep dive on the topic with plenty of notes. Hopefully I can save you and others some time/confusion.

Attackers - Types of hardware they use for computation

In terms of security, an attacker will generally favor hardware that can greatly parallelize the computation to achieve a faster success rate. There's a few options for this, but they have their own tradeoffs in cost to performance and how the choice of a Password Hashing Function affects their viability.

ASICs while being the most efficient and low cost per unit to produce, have a high upfront design cost prior to mass production (usually within the millions IIRC). This makes it a non-option for many, they're also AFAIK tailored hardware tuned to a specific purpose, they don't run software, their function is in pure hardware implementation.
FPGAs are the next closest thing to an ASIC, but more flexible. Per unit cost is higher but they are far more accessible of an option to an attacker. They have some hardware resources to use, but mostly it's like a lego blocks equivalent of logic. Again you don't run typical software and programing languages on these, they're slower than an ASIC but you can more affordable iterate and optimize your computation logic or switch it out to attack a different algorithm entirely.
GPUs provide very affordable and accessible compute power. Many little compute cores that are great for parallelizing computation with. Software is more compatible / interopable with this hardware, but for the most part they have their own compute languages as writing software for this topology differs from how traditional CPU logic works. There is popular open-source software that inexperienced attackers ("hackers") can utilize such as Hashcat, which has optimized GPU compute code to perform the computations and provide a CLI to input parameters and provide feedback/results.
CPUs are what your servers will generally be running these hashing functions on, it's much slower in comparison but often has more resources available to it, especially memory such as it's internal caches and system RAM. An attacker with many systems under their control may utilize this, but it's otherwise not as efficient.

How do we defend against the attacker?

For a defender (you) the main concern is how much time it would typically take an attacker to on average guess the original password successfully. If the time is too long, the cost to perform it may be too high for an attacker to justify. Most users are not being directly targeted by such an attack, the attacker typically has access to a data breach of many user hashes and will prioritize their resources for low hanging fruit (weak passwords).

The best the defender can do in this case is discourage the attacker by increasing the cost of computation. That's where choosing the right Password Hashing Function and parameters matters. It's unfortunately not as easy as advising everyone to use the same configuration as you need to balance security and usability (extending the time to compute affects response time as well as your own available resources under load).

potential tradeoffs between these algorithms

Some are easier to attack, some are easy to configure but poor at balancing response time vs security

### pbkdf2-hmac-sha256 PBKDF2 is usually paired with HMAC-SHA256, but isn't limited to that. From what I understand, you set high amount of iterations (1Password and Bitwarden default to 100k iterations), and besides some additional overheads each iteration is roughly the cost of computing two SHA-256 hashes. Password managers tend to use this as a KDF for encryption keys derived from your master password. It's useful for client-side browser extensions which can use the native Web Crypto API browsers offer with support for PBKDF2, which your options lack and would thus incur higher performance overheads (maybe less of a concern today with WASM). On a GPU the memory requirements are too low AFAIK, that the attackers performance fairs much better than other options like bcrypt, requiring higher iteration counts (and thus computation time) or weaker security strength by comparison. It is NIST approved and valid for FIPS compliance I think, which the others may lack. Otherwise it's not likely a good choice for password storage vs your other choices. ### bcrypt **bcrypt only ever uses 4KiB of memory**. This makes it an easier target to parallelize an attack against as the memory requirement is fixed and low. - ScatteredSecrets has a few blog articles about how they use FPGA clusters to efficiently attack bcrypt hashes at a far better rate than equivalent cost of GPU hardware. This is with old FPGAs (roughly a decade old IIRC), and is still more efficient than current gen GPUs. - GPUs including the current gen we have today aren't able to fully utilize their compute power as GPUs compute cores are grouped into smaller clusters with shared local memory (L1 cache), and the clusters themselves share a slightly bigger L2 cache. That local memory is not sufficient to provide all compute cores in a cluster with 4KiB local memory each, thus they need to reach out to global device memory of the GPU (where the bulk of the vRAM is) and this imposes quite a bit of access latency slowing down their efficiency. There are also bottlenecks related to how many cores can access that memory at a time. Thus bcrypt performance suffers. *Eventually future GPUs will have enough local memory for 4KiB per core and become far more effective.* bcrypt has been available since 1999, and is one of the most common choices you'll see in production. Dropbox, Auth0 and other popular services have blogged about how they utilize it. It's often the one you'll come across in frameworks or guides. It has a simple configuration, you have a `Work Factor` value that is a [base-2 logarithm for iteration count](https://security.stackexchange.com/questions/85966/how-does-phps-password-hash-bcrypt-cost-factor-translate-into-cracking-comput), that is it starts at 2 and doubles the amount of work as the WF increments (eg `2^8 = 256, 2^9 = 512, 2^10 = 1024...`). Over time, as processing power becomes more affordable all you need to do is increment the WF and now an attacker needs 2x the resources. **Your only defense however is increasing the computation time**, if your own server hardware isn't also upgrading, this has a negative impact on you. ### scrypt For the most part, this aimed to address the weakness of bcrypt by increasing the memory requirements. This was notably beneficial as memory is relatively expensive, especially when you're adding it to ASICs or FPGAs and the defender can just increase memory requirements. That can reduce the amount of parallelization, or in some cases be too high requiring attacker hardware to be replaced which can be very costly. Unfortunately the memory isn't as flexible separately. You'll find the computation time is tied to the memory requirement. So the higher the memory required, the longer your CPU spends computing the hash result. You do want the computation time to be slowed down, but not too heavily that it negatively impacts your service. Parameter wise, you're still mostly looking at adjusting one parameter, it's very much like bcrypt. Instead of a work factor, you're increasing the amount of memory required, which will as mentions scales the computation time. There are two other parameters but generally they're not touched, the 2nd is related to memory access as AFAIK has little security value deviating from `8`, while parallelization is often at 1 (some implementations don't support it affecting portability, and I'm not sure if you're getting much of a security improvement vs adjusting memory?). ### argon2 Argon2 later arrived and is one you'll often see mentioned and advised to use today when possible. You can adjust iterations `T` separately to memory `M` requirements. You can also utilize parallelization `P` too if you'd like to use more CPU cores/threads, this brings down compute time but the attacker must also compensate, I believe it's most useful in contexts where you'd not be otherwise utilizing the extra cores, eg personal system rather than an auth server for many users. You can double/halve `T` or `M` parameter values and the time to compute will scale accordingly. Finally you're able to balance/scale the CPU and memory load separately to your preferences. You'll generally want to have a low `T` (minimum `3`, `10` for `argon2i`) reducing load on your CPU, while increasing `M` to a value you're comfortable with (usually based on an appropriate compute time per hash for your server to handle). By raising the memory requirement sufficiently, the cost to the attacker is greatly increased by reducing their ability to parallelize computation (rate / number of password guesses). Unfortunately, despite some benefits over scrypt, performance in different environments seems to vary, this means depending on your server, if you compare the two by the time they both take to compute, argon2 may need to use less memory, I've fond it at parity on my system, but 4x slower on a VPS instance. There's also 3 modes, `argon2i`, `argon2d` and `argon2id`. For the majority of cases, general advice is that you'll always want to prefer `argon2id`.

edge cases to look out for

Mostly bcrypt password input size (not length!)

An attacker may try to stress a server by sending 1GB+ sized passwords, a good way around this is to use SHA-256 or similar to pre-hash the users input from the client-side, which has the additional benefit of your received user passwords always being the same size. ### pbkdf2 You may come across [this interesting tidbit](https://soatok.blog/2020/11/27/the-subtle-hazards-of-real-world-cryptography/), which allows a binary hash to match the same output as original string input that exceeds a certain length. An example of one of the collisions is given [here](https://crypto.stackexchange.com/questions/15218/is-pbkdf2-hmac-sha1-really-broken): - The password: `plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd` - Could has a SHA-1 hash that has bytes which convert to ASCII string fine: `eBkXQTfuBqp'cTcar&g*` - They thus both can be used as password and will be considered the same for HMAC-SHA1 due to the longer length password being hashed to the same bytes as the shorter password ([as expected under this condition for HMAC](https://en.wikipedia.org/wiki/HMAC#Definition)). [That's not actually a problem for passwords](https://crypto.stackexchange.com/questions/85841/pbkdf2-hmac-collisions) to be concerned about. Still worth pointing out and clarifying :) ### bcrypt - There is a 72 byte input limit. Any additional bytes beyond that will be truncated and not contribute further to security. You'll get the same resulting hash. Pre-hashing (eg SHA-256) is a good workaround, and would help avoid an overflow bug: - 72 bytes would cover most user passwords pretty well, however non-ASCII input can use many bytes for unicode sequences that represent a single visual glyph (eg `👩‍👩‍👧‍👦` the family emoji is 25 bytes, composed of 7 unicode codepoints). - There was a bug in the original OpenBSD implementation patched in 2014, and NodeJS in 2020 where length overflowed after 255 characters, potentially making passwords much weaker (257 character password would be 1 character only, very easy to guess). - The output hash is 184-bit. This is plenty of entropy, even if you performed client-side hashing (aka server relief) with a fast SHA-256 hash on the server for storage in the database and verification. 184 bits is too large for anyone to brute-force. Any input exceeding 184 bits of entropy (not equivalent to passwords size in bytes), technically has no increased security. - This isn't a practical concern, I only mention it because of some users claiming SHA-256 pre-hash on their input could reduce the entropy strength of their password that exceeds 256 bits, it's irrelevant because of the final bcrypt hash having less entropy. ### scrypt The algorithm can be modified for TMTO attacks (Time Memory Trade Off). One is with parallelization of multiple cores that can reduce the time down by roughly 25% IIRC. The other allows trading additional CPU time for reducing required memory. This is roughly scaling the CPU time such that the time to compute the hash remains the same but resource requirements has become adjustable. As scrypt was designed to defeat attackers better by forcing cost to attack up with the cost of memory, this may allow an attacker to more affordable attack with hardware with limited memory which would otherwise have been unable to handle the memory requirements. ### argon2 Two of the judges involved in the Password Hashing Competition have tweeted or mentioned elsewhere online that using argon2 for interactive login (where you want a hash computed in less than 1 second) is weaker than bcrypt against an attacker. This information was vaguely supported but many have circulated it. I contacted one of them and managed to get more information: - The time duration isn't relevant it's actually the workload itself to compute the hash, the hardware that performs the computation varies the actual time to compute naturally. - It's GPU specific advice. There was no actual GPU test of the algorithm to validate it, but theory/math of the judges technical knowledge and deriving bcrypt performance from Hashcat benchmarks. FPGAs are more effective for a professional hacker to attack bcrypt, and thus this advice spread around is misleading. - The time duration that was considered safe was based on increasing `M`, where they detailed several configurations (with `T` values of 1-3) normalizing `M` value to match the computation time that a bcrypt work factor completed in. This showed 32-64MiB depending on `T` as the point that GPU performance began to suffer notably against argon2 due to memory bandwidth bottlenecking the number of parallelized instances or memory access, something like that. - As they did not validate the math with a GPU implementation, performance may differ (as in less optimal still even without the bottleneck). Thus despite their reputation, it's difficult to trust the validity of the advice in practice.

what situations they should be applied

They're all better than nothing :)

- pbkdf2 has support in web browsers native crypto web API, thus stands to perform much better for client-side hashing. Password Managers with browser extensions often use this, including in their native apps (1Password and Bitwarden at 100k iterations), albeit they use it for encryption keys rather than web auth. - bcrypt is simple with only a focus on CPU load and low memory requirements. It's still valid and in use today, it's well studied and proven having remained relevant for over 25 years. - scrypt is better than bcrypt for the memory requirement advantage. It's only been released since 2009 IIRC, and seen less adoption due to reasons described earlier. I recall it being originally intended for disk encryption keys, where much larger memory requirements was acceptable on a personal system, and waiting multiple seconds to process for the security benefit was acceptable. The author has advised 100ms for web auth as an acceptable target in the past however. - argon2 is younger still which has had some discouragement to adopt early by more experienced professionals, while many others with less informed backgrounds are familiar with the pitch / benefits to choose it over others and parrot that. So far it seems to have faired well despite the two PHC judges in 2019 discouraging it for web auth over bcrypt which seems invalid. It is becoming more acceptable to adopt, but less straight-forward to configure vs scaling a single parameter. AFAIK, bcrypt is the only one that isn't a KDF (separate use beyond a PHF). PBKDF2 and bcrypt aren't the latest but you'll find leading modern products with plenty of users and financing still favor and trust them. scrypt and notably argon2 can afford more security strength due to memory requirements, but do note for a web auth server, the memory usage required just for auth is going to scale based on concurrent users logging in. - Usually this isn't a major concern, if you're small you probably aren't serving login requests constantly, while larger businesses can afford to scale their resources appropriately. - It may be an attack vector, thus you need to rate limit concurrent requests to avoid your hardware resources being exhausted, while still allowing for legitimate users to login. This isn't limited to scrypt and argon2 (which have more impact on available memory), putting heavy load on CPU will reduce the responsiveness of your server too. ### Alternatives to PHFs For web auth, there are alternatives (some involving no password storage or not requiring the user to use a password at all): - You can offload the hashing work to the client (server relief), but this does have more potential to go wrong elsewhere including reducing security (eg using deterministic salts allows for pre-computation attacks in advance to gaining DB access) or additional work/requests (sending a salt from server, without enabling the attacker to verify a user account exists by timing attack or lack of salt returned). - OAuth/OpenID-Connect, using external auth providers (Facebook, Google, Apple, Twitter, etc social logins) or services (Auth0, Okta, AWS Cognito) - WebAuthn with FIDO2/U2F clients - Secure Remote Password (SRP) - A zero trust protocol that doesn't require the user to send their password to the server to authenticate their identity. 1Password uses this. It's not as common to find support / adoption of, but often advised as a better choice.

Short version

You're probably going to be ok with bcrypt, OWASP still advises it, as does Mozilla.

scrypt and argon2 should be fine, argon2 is a bit more flexible, just keep the memory requirements impact on your own server in mind too.
Usual advice for parameters is to measure on your server and aim for not taking too long that the delay would impact your UX negatively or put your server under too heavy load, while balancing that against the need for raising the security strength. 100-250ms is often targets advised.

tarcieri commented 3 years ago

@polarathene wow, awesome! Thanks for doing that!

Would you mind opening a PR to add that to the toplevel README.md? There's probably some nits/small changes to be made, but those are easier to discuss as line notes in review.

polarathene commented 3 years ago

Seems a bit bulky for the README? Could be a separate document that the README gives a brief summary of and links to for more detail?

I did run each locally with parameters to get a rough comparison of what equivalent runtime for each was, and then did so again on a cheap VPS instance where I found argon2 to become 4x slower than scrypt which it was otherwise on par with on my local system performance. I think that'd be useful to document in a table for others as a reference.

I can put that information up as a PR in the meantime, but won't have much spare time to allocate until I resolve two other pressing tasks (might take a week or two).

tarcieri commented 3 years ago

Sure, take your time

newpavlov commented 3 years ago

Yeah, adding a separate file sounds like a good idea. Eventually we may move it into a "RustCrypto Book", assuming we will create it that is.

tarcieri commented 3 years ago

The OWASP Password Storage Cheat Sheet seems pretty good:

https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html

polarathene commented 3 years ago

Yeah, I linked that earlier. Looks like it received some updates since. Perhaps a contribution for me isn't required? Still rather large backlog elsewhere :sweat_smile:

tarcieri commented 3 years ago

Aah yeah, see it now.

I like what you wrote, but also, I think the OWASP guide is pretty good, and if it's one less thing for us to maintain perhaps we should just link to that.

polarathene commented 3 years ago

Yeah, works for me :)

If there's any need for more info on one of them from future issues, just refer to the comment I made :+1:

RustCrypto / password-hashes