A collection of the default parameters used in web frameworks

alsuren commented 4 years ago

Describing a method for coming up with parameters is a bit hand-wavey. The

I am not a cryptographer, so I would prefer to have a document that says what each major web framework uses as their default values, and what that means for the cost of cracking passwords stored by those framework (might also be interesting to compare against the cracking costs of those frameworks' other password algorithms, as well).

I am going to start collecting a list here (editing the description to avoid spamming people). Feel free to add your observations in the comments, and I will try to incorporate them as they come in.

RFC

The RFC suggests: Frontend server authentication, that takes 0.5 seconds on a 2 GHz CPU using 2 cores - Argon2id with 4 lanes and 1 GiB of RAM.

There seems to be some confusion around what this means. Does it mean "1GB of RAM dedicated to the task of hashing passwords" (seems excessive) or does it mean "1GB of RAM total, with some large number of concurrent requests"?

https://tools.ietf.org/html/draft-irtf-cfrg-argon2-11#section-4

Django

time_cost = 2
memory_cost = 102400
parallelism = 8

This is because that is the default in its underlying library:

2020: https://github.com/django/django/commit/1621f06051c94665f2edf492c10316875896e7eb 2018: https://github.com/hynek/argon2-cffi/pull/41/files#diff-fbf2a144a55a0f8a5cae32173f0b2efbR10-R14 claims to set the parameters to match the RFC, but uses 0.1GB of RAM (I'm not sure if it's trying to take the 1GB of RAM literally and typoing).

Kratos

Kratos uses 4GB of RAM by default (which sounds like Backend server authentication, that takes 0.5 seconds on a 2 GHz CPU using 4 cores -- Argon2id with 8 lanes and 4 GiB of RAM. from the RFC, taken literally)

2019: https://github.com/ory/kratos/commit/bf3395ea34ecf85303034f3e941a049c8cbd6229#diff-2449a6ea083767b149400840c05f41bdR100-R108

Go /x/crypto/argon2

The draft RFC recommends[2] time=1, and memory=64*1024 is a sensible number. If using that amount of memory (64 MB) is not possible in some contexts then the time parameter can be increased to compensate.

I can't see these numbers in the RFC. but seems reasonable to assume that people in the go community will copy these values.

https://pkg.go.dev/golang.org/x/crypto/argon2?tab=doc#IDKey

polarathene commented 3 years ago

These algorithms are used to slow down computation, which under web server auth needs to not be too long to compute but enough of an added workload to discourage an attacker. Typically regardless of algorithm chosen, you're aiming for less than 1 second, often a target execution time of around 250ms is advised (I think there is/was a IETF draft in 2020 suggesting that). The scrypt author has advised 100ms before.

These durations are tailored to your server performing them, that same workload to compute may run faster on more capable hardware, and attackers will either use ASIC/FPGA or GPUs to accelerate with HW or parallelized computation. The target time you choose is what's most appropriate for your server to handle and provide a good UX, while offsetting the difficulty to the attacker somewhat.

A current nvidia RTX 3080 GPU manages bcrypt (at a work factor of 5) with a hash rate of 75k per second. When you adjust for a more typical work factor on servers today of 12 or higher, that's reduced to about 600 bcrypt hashes per second. GPUs aren't ideal for attacking bcrypt, FPGAs are more affordable for this (so when you see some advice about argon2 not being suitable for less than 1 second runtime, keep this in mind).

scrypt defaults to 16 MiB (N=16384, R=8, P=1), it scales execution time with the memory used (typically you adjust just N).

argon2, isn't much different from these other than you're generally advised to use argon2id and increase memory M up until reaching your target time, if you're lacking a large enough memory budget for doing so, you can increase T(time or iterations). If the implementation supports increasing P for parallelism(lanes), you could increase this per core and it will reduce the runtime requiring you to compensate more. If you double or halve M, the runtime roughly adjusts by that amount, as does the same adjustments against T.

For comparison, here are some runtime figures from my i5-6500 Skylake (3.2GHz 4/4 cores/threads) desktop machine, and a cheap $5 Vultr VPS:

Desktop vs budget VPS times:

bcrypt (from workfactor of 12):

12: 247ms, 332ms
13: 495ms, 670ms
14: 991ms, 1307ms
15: 1975ms, 2633ms

scrypt (N as log2 value, eg log2(32768) = 15, equivalent of 2^15, 32 MiB = 128 * 8 * 2^15 bytes, 1 MiB is 1024*1024 = 1048576 bytes (2^20), divide by that to get the 32 MiB value from total bytes):

14 (16 MiB): 51ms, 92ms
15 (32 MiB): 104ms, 187ms
16 (64 MiB): 207ms, 358ms
17 (128 MiB): 417ms, 742ms
18 (256 MiB): 832ms, 1671ms
19 (512 MiB): 1674ms, 3271ms

argon2 (T=3, P=1 with adjusted M, specified in KiB eg 32 MiB is 32,768 KiB):

16384 (16 MiB): 41ms, 211ms
32768 (32 MiB): 102ms, 425ms
65536 (64 MiB): 209ms, 919ms
13072 (128 MiB): 420ms, 1815ms
262144 (256 MiB): 858ms, 3229ms
524288 (512 MiB): 1749ms, 7167ms

You can see that for each algorithm they're scaling fairly linearly, but the difference between them when changing to the VPS varies, noticeable for argon2. This is why it's important to test on your own server adjusting the workload for what your server can handle appropriately.

On the attack front, scrypt and argon2 are more favorable for their memory requirements (bcrypt only requires 4KiB always). GPU/FPGA/ASIC cannot parallelize computation as well when there's not as much memory available, and adding that to be sufficient if at all possible for the attacker makes the attack more expensive. Not only that, memory access has limitations with how many cores can access it at one time and the bandwidth (eg GB/sec) it can handle at any given moment.

32-64 MiB at least with argon2 appears to introduce bottlenecking against a GPUs bandwidth, and is not particularly high for a server to work with. We can see from results above that it may take 400-900ms allowing you to perform 1-2 logins per core per second. This seems that T=3 M=64 MiB P=1 would roughly manage 1,000 argon2 hashes per second on an nvidia RTX 3080. So 1ms per hash on that GPU.

That won't protect someone with a bad password for long, but it is significantly slower than 7 billion/sec SHA-256. Having salts (which these all can generate for you) requires the attacker to attack each password separately, so non-targeted attacks (not a specific user) is not very practical. They can invest in hardware resources further to scale the attack speed but it gets very expensive for any speed gains.

TL;DR

Tune the parameters for your server, mostly you're wanting parameters for a workload that's sufficiently slower for the attacker without stressing your server too much and negatively impacting UX response times (250ms seems to be a common target for execution time).

argon2id T=3 P=1 M=65536 (64 MiB) is probably a good choice to start with. If it's too slow, you could reduce M. I've seen T as a default of 3 due to some attacks (specifically for argon2i I think?) that could take advantage of low T.

These settings won't compensate against a user having a bad password like 123456, slowing down the computation enough to defend against that would not be worthwhile and introduce other problems.

SparkDustJoe commented 3 years ago

I would also add a hybrid approach where some of the calculation is done on the client machine and the rest is done on the server in a two step process, allowing for mobile applications doing enough work to be reasonably secure and the server doing the heavy lift to be more secure -- by allowing throttling on the server without killing the UX experience on less capable devices. To prevent replays, the server and client can agree on an additional salt per transaction much like TOTP/HOTP/Challenge-Response.

P-H-C / phc-winner-argon2