Open robotdan opened 5 years ago
Due to salting + configurable slowness, the method of cracking some entries in a reasonably configured database is taking the n most common passwords and attempting to brute forcing each entry. There is a balance to be found between the server cost to validate a hash and relative cost for a given dictionary size n selected by an attacker.
Reactive hashing does two things:
This allows a more targeted attempt at brute forcing a given db entry.
If we assume n is chosen as a function of cost, where cost scales as:
cost ~ n*m/h
where
For a fixed cost we can see n increases with any increase in h, and more importantly we have a better criterion for populating n for each partition in the reactive hashing scheme (which can be probed with chosen plaintext attack -i.e. they sign up multiple accounts with different password lengths to see how they appear in the db).
For example
So lets say we divide the database into two, passwords with length < 16 and those with length >= 16.
Brute forcing as described above doesn't really change for the first partition since the n most common passwords in general will likely all be less than 16 characters. For the second partition, we now actually have a shot at cracking a few entries since we can populate n for this partition only with the most popular passwords of length 16+ (and n for this partition can be extended due to the increased hash-rate from reactive hashing). If reactive hashing allows for more than 2 partitions, one can see how this only increases the expected number of entries cracked for a given fixed cost
One would have to dive into the data dumps of late to quantify the impact (if any) of this feature on security. It's probably marginal, but not something unworthy of consideration.
What's the point of this feature? To reduce cpu overhead for password validation which in turn saves money and perhaps mitigate the potential for it to be used to DDOS your servers.
Does Reactive Hashing effectively facilitate this? The benefits of reactive hashing only come into play for the subset of users that have passwords that meet some criterion that above the minimum requirements. So really, its utility is a question of how many users actually meet said criterion. Users in general simply don't choose good passwords, opting to permute and stuff credentials to meet the minimum requirements.
The utility sought by this feature is only realized by just using the faster hashing scheme and changing the password requirements to match; it requires additional configuration (peppering, aka "in-memory salt modification" in the docs) in order to not introduce new weakness (however slight)
To be frank: this is a terrible idea.
Password quality is not only determined by it's length. You are missing dictonary attacks and attacks wich use context info, such as account name, which also work effectively on longer passwords. For instance '1234567890qwerty' is a quite long but very low quality password (and has high entropy).
This would probably have very minor impact on performance as most users have passwords way smaller than the minimum entropy threshold for not using key stretching. See this analysis from the eHaromney password dump: only 384 of 1.2 million passwords had a length of 15 (or longer)
Even if you are combining known hashes, you are essentially creating your own key derivation function which is never a good idea (if you are not a security/crypto researcher). My argument would be: if it would be a good idea to compromise brute force resistance on longer passwords, I guess there would be a password hash/KDF that would have that feature.
tl;dr Don't roll your own crypto
You make your clients more complex: Old user with long password, but used to have bcrypt? You want to change the password min length for SHA-256 from 16 to e.g. 20, how do you sync all your client code? How would you migrate such rule changes?
You leak information about the password: maybe an attacker will only choose the SHA-256 hashed PW because she has more luck to find that in a pw dictonary?
Further you are mixing CPU and memory demand of password hashing algorithms. CPU time is cheap and usually easily scalable, memory however is not. This is why bcrypt is still relevant (uses mainly CPU and only fixed amount of memory), while e.g. scrypt is not (scales CPU with memory and uses too much RAM in secure settings)
Do you have any hard facts, statistics that proof that e.g. bcrypt is a significant contributer to reduced performance? Depending on your implementation I would guess that login is only done once per session (maximum couple of times per day per user). Are there other means to decrease that impact, e.g. longer sessions? Are there other means to reduce the risk of being DDOSed?
https://blog.benpri.me/blog/2019/01/13/why-you-shouldnt-be-using-bcrypt-and-scrypt/
This is incredibly bad advice (ditching bcrypt/scrypt in favor of minimal SHA iterations). Using passphrases is good but the belief that length is relative to security strength is naive.
"correcthorsebatterystaple" will take about 20 minutes to exhaust 43 bits of entropy (average success rate is half of the keyspace, ie 1 bit less from the 44-bit passphrase) on a single current generation GPU. With a slow hash that can easily become >100 years, and that is less time to compute the hash than it takes for me to request a web page from the server and get a response.
Those are optimal attack times based on NVIDIA 3080 handling 7 billion SHA-256 per second and Kerckhoffs's principle.
If you want to offload server load, have the clients perform the hashing or a portion of it (always hash even if just SHA, the received client hash for validation without making a compromised leak of hashes usable passwords in themselves).
1Password has the benefit of providing users 128-bit keys that are part of their security policy, the app uses this to derive keys for decryption, it is not stored on the server thus any compromise even with simple passwords still have a 128-bit entropy key to crack (which requires more energy than it takes to boil all of the worlds oceans). That does pose it's own problems if the devices are lost as the service cannot recover/fix that, but additional fallbacks may be available (family/team plan) where the admin can assist with recovery.
@polarathene - your math appears to be off. Entropy is a calculation based on length and pool size. In general, passwords are functions of ASCII characters as the pool size and the number of characters as the length. Then you apply a log base-2 in order to calculate entropy.
Here's a good article on password entropy for you to review:
https://generatepasswords.org/how-to-calculate-entropy/
According to the math, if you use a password like correcthorsebatterystaple
which has 25 characters, the corresponding entropy using a pool size of 95 is:
log2(95^25) = 165 bits of entropy
Even using the largest GPUs in existence currently, this will require 2^30 years to brute force to a 50% probability or so (I'm rounding a lot but it doesn't matter given the numbers being so large). Even if computing power went up by orders of magnitude, it would still require millions of years to brute force a password of that length. Therefore, this password is secure using a SHA-512 hash with a single pass.
As an experiment, here is a SHA-512 hash of a password that has somewhere between 1 character and 48 characters. If anyone can reverse it and tell me the original value, I'll make it worth your while. :)
935ff37fb53d9bc30b5238e3523e9a79b68dec6209a3f7aa40f5af1fcbd6051ba319d7026b7a8b62acd2dfb8efc0ad1264209bcaa53e7dd9e9e94a7046b617d9
your math appears to be off. Entropy is a calculation based on length and pool size.
My math is not off. This is the same mistake of the linked blog article I was quoting. That article references an XKCD article which very clearly states that "correcthorsebatterystaple" is 44 bits of entropy, while the author provided math of 117.5 bits (log2(26^25)
).
So how did XKCD come up with 44 bits of entropy? Is the comic author wrong instead? Nope, and I mentioned why with this line:
Those are optimal attack times based on NVIDIA 3080 handling 7 billion SHA-256 per second and Kerckhoffs's principle.
Kerckhoffs's principle assesses entropy based on knowledge of how the password/passphrase was generated. So for "correcthorsebatterystaple" we have 4 words, and if they're randomly sampled from a set of 2048 words as our "alphabet" (as if it were a different language where a word is equivalent to a letter) we would have 44 bits (log2(2048^4)
).
In that sense, a generator like getapassphrase.com can output two passphrases both of 44 bits of entropy like these:
Despite the difference in length and even words in this case, they are both 44 bits of entropy based on the generator rules. If you know the user you're attacking has generated a 44-bit passphrase from this site, you are able to perform an optimal attack at 7 billion guesses per second of SHA-256 and within 20-40 minutes (latter being full exhaustion of the key space), you'll have it cracked.
Therefore, this password is secure using a SHA-512 hash with a single pass.
It is only secure if the attacker couldn't possibly know more about your password and brute forced it in the manner you've calculated your entropy on 95^25
. In general there are far better and more likely successful strategies that are used first to get any low hanging fruit (when a password hash dump has been leaked), but this differs from a targeted attack when the attacker is specifically interested in your account credentials and willing to focus all their efforts and time upon recovering that password.
If you are valuable enough to warrant that sort of attention, and your password is strong enough to withstand their computation power even when they know exactly how you generated your password... well they don't have to be that well funded to take more affordable low-tech approaches... (which while unpleasant, is often one of the only viable options). On the bright side, being immune to the majority of attackers around the world is a win.
Here's a 64-bit entropy passphrase btw, this would take about 4 years with 10 nvidia 3080 GPUs (( (((2^64) / (7e10)) / (60*60*24*365)) / 2 ) = ~ 4.178 years
): "insane deer and comical boar remember rare raven in Jordan". Obviously that would take much longer without prior knowledge of how the password was generated, but the true entropy should be evaluated by if the attacker did know.
Even if computing power went up by orders of magnitude, it would still require millions of years to brute force a password of that length. As an experiment, here is a SHA-512 hash of a password that has somewhere between 1 character and 48 characters. If anyone can reverse it and tell me the original value, I'll make it worth your while. :)
Relevant XKCD ;) (joking obviously)
At best one might try leverage some dictionary attacks, but assuming you've generated a password randomly with equal possibility of each character being from the pool of 95 values, that would be enough to not bother.
( (((2^64) / (2.3e12)) / (60*60*24)) / 2 )
). This entropy can be had for 5 words out of a 7776 sized word list (eg: Diceware).If it's actually generated in a more memorable way such as words that reduces that entropy more, if we know those words are all lowercase or some other consistent pattern, if any punctuation/separator is used (space, hyphen, pascal case, etc), again we narrow that perceived entropy down closer to what Kerckhoffs's principle would evaluate the strength by.
One last mention for any visitors reading this.
A high entropy password as secure as it may seem cannot be re-used across services safely. There are various ways a service can be compromised and some even in 2021 that are far less reputable still store passwords in plaintext, all it takes is one of those being compromised to gain access if it can be re-used on more valuable targets.
Likewise, if you have some clever little pattern for extending the length with a shared prefix/suffix (eg a personal pepper if constant, or salt if it differs slightly), something some users feel more comfortable with when trusting a password manager to store the password without their secret portion, if one or more of those passwords are leaked from services and the plaintext is known, it is very likely that pattern becomes exposed weakening the perceived entropy and enabling an attacker to pull off a targeted attack more successfully. You'd be better off with something like LessPass (not without it's own drawbacks), or trusting the entropy of good generator from a password manager.
Just a reminder that while we appreciate lively discussion, we also want it to be civil. We have a community code of conduct here: https://fusionauth.io/community/forum/topic/1000/code-of-conduct
Please think twice before mentioning violence, even when you are joking (I understand the context of the comment and the XKCD comment, but it may be interpreted differently by different readers).
@polarathene - please edit your comment when you have a second so that is not mentioning any type of violence.
Also, I'll be responding to the last comment at some point. It still appears to have some mixed messages and incorrect math. It will take me a bit to get to though so stay tuned.
but it may be interpreted differently by different readers
Apologies, I thought given the linked content and emoji usage the context of not being serious about violence would be difficult to misinterpret (while that approach is very real, I don't endorse it obviously).
please edit your comment
I hope that's sufficient?
It still appears to have some mixed messages and incorrect math
I'd be glad to clarify any of that for you. I'm always open to being wrong and corrected myself, but I'm fairly confident in this math. If I have slipped up, do let me know! :)
64 bits of entropy:
( (((2^64) / (7e10)) / (60*60*24*365)) * 0.5 ) = ~ 4.178 years
== ( (<entropy> / <hashrate>) / <seconds in a year>) * <halve time for average success rate> ) = ~ 4.178 years
7e10
), we get the total time in seconds to exhaust every permutation in the given keyspace for that entropy range.~75k/sec
on that same RTX 3080 GPU. These days it's fairly standard for the work factor is tuned to 10-12, a hash rate of ~500-2,000/sec
)... augmenting the 4 years attacking SHA-256 hashed passwords to 15-60 million years.
FWIW, SHA-512 on the RTX 3080 is about a third of the SHA-256 hashrate slowing it down to approx 2.3 billion/sec (ie 2.3e12
from my earlier comment is 1000x 2.3e9
aka 2.3 billion), so bump that 4 years estimate to 12 years for average success rate.
I'm not sure where I've gone wrong with the math above, but perhaps the perceived approach of an attack? (eg a naive attack brute-forcing variable string length with an alphabet/dictionary of 95 values to permute through). No doubt that would take longer and be infeasible if the attacker has little information to go on, but that's not a good assumption to make in security AFAIK, especially if you value what those credentials provide access to.
If you were questioning the math from some of my facts, I felt I was already being too verbose in my response. Let me know if you'd like any more info on any of that.
Well, it took me quite a while to finally get back to this, but I think the problem here is a misunderstanding of the core problem statement.
@polarathene is specifically talking about dictionary attacks only with a limited dictionary of 2048. This would reduce the entropy to a degree that makes the problem solvable (in a time-frame that make sense). His math is correct based solely on that premise and a weak hash such as SHA-256.
However, this is not what this discussion is about. Nor have I ever suggested that we limit this discuss to such constraints. In fact, I presented a password hash using 48 characters from a set of 100 options (Ascii), which makes the entropy much larger (336 bits if my math is correct). Just because the XKCD example uses 4 dictionary words does not mean the problem statement is limited as such. Anyone could change just one character in correcthorsebatterystaple
such as corr$cthorsebatterystaple
and the entropy instantly balloons. You can also repeat characters, reverse words, and many other tactics to increase entropy. And you'll never know what the rules are, so as an attacker, how could you possibly figure it out?
If we return to the original problem statement and limit our discussion to the domain presented, then the discussion is less about entropy solely and more about other aspects such as:
These questions remain in the correct domain but analyze the original problem statement to determine if it is viable.
I seriously want to emphasize that as a security professional you really should take Kerckhoff's Principle into consideration when it comes to entropy.
If you ignore Kerckhoff's Principle, then you consider this perfectly secure:
zzzzzzzzzzzzzzzzzzzzzzz🎯zzzzzzzzzzzzzzzzzzzzzzzz
64 bits of entropy is fairly solid:
bcrypt
can be used with low iterations if resource efficiency is a concern (CPU, energy costs, etc), but you'd be better served with memory-hard bottleneck with argon2id
.62^20
) sequence. That is more than enough entropy (needs more energy than is required to boil all water on earth). Just not convenient for manual input, but perfectly fine stored in a password manager.Apologies, I didn't read over the entire discussion and ended up repeating a fair amount that's been said already. I've tried to manage the verbosity of this response to accommodate that.
I think this response bridges any gaps in miscommunication.
You are misunderstanding what I've said, here's a response summary:
log2(100^48) = 318.9
), if each input of that length selected randomly (uniform distribution), you can claim almost 319 bits of entropy. Your justification for it is the attacker has not optimized for bias.A quick glance over the discussion history seems it was about:
Kerckhoff's Principle was mentioned multiple times and ignored.
correcthorsestaplebattery
) with pattern of 4 words (that the XKCD source clearly expresses as 44 bits of entropy), while the blog author thinks the entropy is 117 bit (26^25
) via naive attack parameters, not keyspace (input permutations) defined by how the password was generated. A motivated attacker is wiser.When you dismiss this measure of entropy via the argument "how could they possibly know?", you justify:
z
), they'd be attempting 26^25
or worse right?z
, fill the length and replace a random one with an emoji.It's better to assess entropy by total permutations if you're able to generate a password with sufficient randomness (minimize bias).
I really hope that gets the message across better?
Pragmatic amounts of entropy was shared, citing (at a SHA-256 hashrate to iterate through a given keyspace entropy):
Thanks for the details response, I’m sure folks that read this thread in the future will get insight from it. I also think we are essentially saying the same thing, just in different ways.
I do want to clarify a couple of points though. A lot of your responses state I’m ignoring Kerckhoff. I want to be clear that I am not ignoring that. I fact, I’m assuming everyone understood that the question “How do we determine if a password should use a weaker or stronger algorithm?” must take into account Kerckhoff. And I want to rephrase from saying “weak or stronger” to “faster or slower”. I think those are more appropriate terms.
Second, your math appears to change. You mentioned 2048^4 as one set size (passphrase based on a dictionary) and then you mention 100 characters of varying lengths for other set sizes. Then it seems like you try to reduce both sets in some cases and then increase them in others. Like if you start introducing capital letters, emoji, or random characters into the passphrase, the set size increases. But if you take 48 Ascii characters the set size decreases based on weak combinations. It’s not completely clear what your point is, but I think we can agree that the size of the set and the length are how you compute complexity and also how you brute force attack the hash (up until a collision which is theoretically infeasible for 512 bits).
And finally, the problem statement hasn’t changed, but perhaps your understanding has. No one has ever stated that XKCD was reality or that we would limit passwords in any way. I mean users can type in whatever they want right?
Let me present some code for how this would work in practice and maybe we can move over into the concrete:
String hash;
if (passwordSucks(password)) {
hash = hashWithBcyprt(password, 14); // Load factor 14
} else {
hash = hashWithSHA512(password);
}
This is super simple code and the crux is the method passwordSucks
. The algorithm should ensure that passwords cannot be brute forced in reasonable time. Here’s one implementation:
boolean passwordSucks(String password) {
if (password.length() < 24) {
return true;
}
if (pwnd(password)) {
return true;
}
if (dictionaryAttackable(password)) {
return true;
}
if (repeatsTooMuch(password)) {
return true;
}
// More checks here?
return false;
}
The other methods called here would exclude what would be “known generators” whereby an attacker could generate the entire sets in reasonable time (like the 2048 dictionary used for passphrase or repeat characters).
Okay, now let’s look at the math.
This function has a base set size of 100^24 (1e48). Each additional character adds an additional permutation set that multiplies the base set by 100. A 28 character password has a set size of 100^28 (1e56).
Each if-check after reduces the set size by some amount. If we assume breached sets contain 1e9 entries, we have reduced the total set size to 1e39 for 24 characters and 1e47 for 28 characters.
If we assume the additional checks reduce the set size by some amount, the final set is the attackable surface. For argument sake, let’s assume we design dictionaryAttackable
and repeatsTooMuch
to reduce our set by an additional 1e9. Our final sets would then be 1e30 and 1e38 respectively.
In order to brute force 1e30 combinations using SHA512, it would take 1e20 seconds. That’s 3 trillion years for the base length of 24 characters.
Let’s go nuts and bump the set reduction to a whopping 1e15. That’s a huge number, but let’s go with it. That would reduce our set sizes to 1e24 and 1e32. Using the same computational power, it would take 1e14 seconds to brute for the 24 character example. That’s 3 million years for 24 characters.
Okay, let’s drink a bottle of whiskey, punch through the Balmer Curve, and really get aggressive with our reduction. Let’s take it to 1e20. That’s massive and probably a mess of spaghetti code from our drunken coding session, but for argument sake, let’s roll with it. Once we sober up, we would find our set sizes have been reduced to 1e19 and 1e27. Using the same computational power, it would take 1e9 seconds for 24 characters. That’s STILL 31 years for 24 characters.
Basically, we want passwords generated by complex generators (secure random 24+ bytes works) to be hashed using SHA-512 and everything else to be hashed with BCrypt or Argon. This is the original problem statement.
Now, I’m willing to discuss a way that this implementation can be attacked, but I propose we talk about implementation details in code with hard numbers
Basically, we want passwords generated by complex generators (secure random 24+ bytes works) to be hashed using SHA-512 and everything else to be hashed with BCrypt or Argon. This is the original problem statement.
I think you're making too many compromises for ensuring a reliable baseline of entropy to get away with a single SHA-512 hash. There are better ways to minimize resource usage when that is a priority.
In theory the SHA-512 approach sounds neat, but I just don't see it being pragmatic for real users (EDIT: When it comes to creating their own passwords + heuristics, not externally providing them from generators / managers).
Your proposed approach with heuristics is guesswork on how much lower the entropy actually could be, as opposed to the simpler rules of a generator where you have a far more reliable source of entropy (there's a good reason they're a thing) (EDIT: I seem to have misunderstood, and you're focus is not on heuristics for users creating new passwords via FusionAuth, but detecting eligibility for single SHA-512 hash of externally generated passwords that should have some acceptable entropy baseline_).
100^24
) and narrow it down to 63-bit (1e19
) as your extreme security margin.26^24
), followed by language and pattern biases you'd lose a good amount of that, while only gaining a little back from the larger charset of 100 ASCII since the actual frequency + unbiased usage of that range would be low.You'd still be salting the input into SHA-512 and storing the salt right?
How would the heuristics apply to the following?:
this is a really good password? Nope!
!epoN ?drowssap doog yllaer a si siht
¿Es esta una buena contraseña? Negativo
1e19
/ 63-bit entropy example):
insane deer and comical boar remember rare raven in jordan
(64-bit)😖👻😴 dizzy pig keeps cute paper
(65-bit 128^3 * 2^44
, aka 44-bit + 3 random emoji prefix)Q%un!oQ5aKe
(67.9-bit 72^11
)7844492364386295433
(63-bit 10^19
)If you're unable to ensure that, false-negative is not too bad for decent passwords, but false-positive on the low entropy is risky. BTW, 36 years becomes 131 days with 100x the hashrate.
For an attack with a single RTX 4080, here's the difficulty:
(([entropy] / [hashrate]) / [seconds of compute]) / 2 = [time required]
(SHA-512: 4.46 GH/s vs bcrypt wf=10
: 4 KH/s)Q%un!oQ5aK
(72^10
, 61.7-bit): Approx 13 years (((2^61.7 / 4.46e9) / (60*60*24*365)) / 2
)<insert example of your password>
(100^24
reduced to 1e19
aka 63.1-bit): Approx 36 yearsinsane deer and comical boar remember rare raven in jordan
: Approx 66 yearsQ%un!oQ5aKe
(72^11
, 67.9-bit): Almost 1,000 yearsdetailed snail summons slim lab coat
(48-bit):dizzy pig keeps cute paper
(44-bit):((2^44 / 4e3) / (60*60*24*365)) / 2
)I'm not sure if I can communicate this any better.
Most of my responses are correcting misunderstandings / misinformation from your end when making statements about what I've said. That is then followed by responses where I essentially rephrase my prior responses, presenting math + sources to back that up in different ways.
If you still believe this is worth pursuing as you've detailed, go for it. I've tried my best to express concerns and feel I have nothing more to contribute to the discussion.
I'll try respond briefly to any follow-up questions, but this should be my last detailed response :sweat_smile: (they soak up a fair amount of my time)
Below I have clarified / corrected statements from your last response, while providing additional math / feedback and food for thought.
Entropy based password hashing
Problem
Password hashing at scale is very costly when using Bcrypt, PBKDF2, etc. The reason for these algorithms is to increase the time it takes to hash a password in order to make it infeasible to brute force.
If the end goal is to keep entropy high and ensure brute force attacks are infeasible, there may be a better than to just continue to increase the algorithm complexity to crush CPUs.
Solution
Build an entropy based solution to select an algorithm and load factor to reach a desired amount of entropy to keep the algorithm complexity to a minimum.
For example, a 16-20 character password hashed with SHA-256 or SHA-512 is quite difficult to brute force even with a large bit coin rig.
https://fusionauth.io/blog/2019/02/21/save-a-cpu-ditch-bcrypt-use-sha2-instead
Additional Reading
https://blog.benpri.me/blog/2019/03/02/reactive-hashing/ https://blog.benpri.me/blog/2019/01/13/why-you-shouldnt-be-using-bcrypt-and-scrypt/
How to vote
Please give us a thumbs up or thumbs down as a reaction to help us prioritize this feature. Feel free to comment if you have a particular need or comment on how this feature should work.