Closed garrettr closed 10 years ago
I should've filed this earlier. Discussed with @diracdeltas last night and we agreed that 12 is very strong. The fastest benchmarks for bcrypt we could find are from Jeremi Gosney's HPC, which was 71,000 cracks/sec at 5 rounds. 12 rounds is exponentially slower/harder to crack - we don't have figures for 12 rounds but they would certainly be significantly lower.
Even if we were using 5 rounds (a common default AFAICT), it would take
>>> (6969**8 / 2.0) / 71000 / 60 / 60 / 24 / 356
1.2738304651161843e+18
years to crack one of these hashes on average.
Our theoretical adversary is very powerful, but even if they had a trillion of these password cracking clusters it would still take over a million years to crack one of these hashes, on average.
>>> (6969**8 / 2.0) / (71000*1000000000000) / 60 / 60 / 24 / 356
1273830.465116184
Given this, plus the tradeoffs that need to be carefully considered (reduced site responsiveness, increased potential for DoS, and reduced developer productivity, in roughly that order), I am pushing this to 0.3.
Just noticed I typoed 356 when I meant 365 (days/year) in the calculations above. Here are the re-done calculations (outcomes did not change significantly):
Years to crack with one cluster on average:
>>> (6969**8 / 2.0) / 71000 / 60 / 60 / 24 / 365.25
1.2415705560064658e+18
Years to crack with a trillion clusters on average:
>>> (6969**8 / 2.0) / (71000*1000000000000) / 60 / 60 / 24 / 365.25
1241570.5560064656
I accounted for leap years this time, for good measure :smile:
Citation: http://passwords12.at.ifi.uio.no/Jeremi_Gosney_Password_Cracking_HPC_Passwords12.pdf. We're pretty sure that bcrypt(05) indicates bcrypt with 5 rounds, though maybe someone more familiar with crypto notation should verify.
The other factor we should account for is Moore's law, since it's unlikely that the state-of-the-art in password cracking now will be the state-of-the-art in 100 years. Using this handy calculator at http://anrieff.net/ucbench/moore.html with 82.7 bit passwords and 71k/s cracking time at t=now gives an estimated cracking time t_crack = 60 years, 10 months (for bcrypt(5), since that's for the 71k/s number).
Let's say that increasing the number of bcrypt rounds by N is approximately equivalent to adding N bits of entropy to our password. For N = 12, t_crack becomes 79 years. That's not a significant enough improvement over 60 years to be worth the cost of bcrypt(17) on our server vs bcrypt(5).
It doesn't seem to me like we can do much better than bcrypt(12), so I'm happy with closing this issue if others agree.
jeremi gosney points out that we've been misusing "rounds" here. A work factor of N is 2^N rounds, not N rounds. https://twitter.com/unhush/status/407601833331265536
Good clarification on the terminology! py-bcrypt calls them "log rounds". Maybe we should use that instead? I'm not sure it's less confusing...
A good "max brute force" rule of thumb (inspired by Schneier's definition in Practical Cryptography) would be the average lifetime of a source, post-submission. So 70-80 years seems about right to me. It might be wise to make an adjustment similar to Moore's law based on advancing medical technology... although I don't think that correlation is so straightforward :smile:
Yes to the log rounds for bcrypt.
Just a thought, we would need to take into account the remaining life span of a source and probably that of their children as well which still works out to about 70-80 years in most cases.
Hi guys,
Yan reached out to me on Twitter to get clarification on my bcrypt numbers, so I thought I'd chime in on the thread.
The "05" you are referring to is not the number of rounds, at least not directly. It is the logarithmic cost setting. So "05" here actually means 2^5 rounds, "12" would be 2^12 rounds, "17" would be 2^17 rounds, so on and so forth. "05" is traditionally used for benchmarking, for historical reasons.
FWIW, you rarely see anyone use a cost setting greater than 8, though occasionally you do see some crazy bastard using 10. But the point of the cost setting is to tune the algorithm to your environment, so if you can support 12, then by all means, do it. But 17 will likely be way too expensive.
There are also a few things you have to understand about the benchmark speed provided in that presentation.
First, this figure is purely a benchmark, and implies a brute force attack of a single bcrypt hash with a cost of 5. No one in their right minds would attempt to brute force bcrypt, regardless of the cost setting, because even at the lowest cost brute force attacks are highly impractical and downright impossible for passwords of sufficient length. You're pretty much limited to dictionary attacks with bcrypt, and maybe rule-based attacks if you are using a small dictionary with a very small handful of rules. And if you have multiple salts, then of course your effective speed will be just a fraction of your total speed, since each plaintext candidate must be re-hashed with each unique salt.
Second, let's not talk about 71 KH/s as if that's fast, because that's downright slow, especially when we're talking about the total speed of a large cluster, and super especially when we're talking about a salted algorithm. So even at a cost setting of 5, bcrypt still provides great protection.
Third, this benchmark was done on a GPU cluster, and bcrypt is not GPU-friendly due to its pseudorandom memory access patterns. If you want real bcrypt benchmarks, it's best to do them on a high-end CPU. Also note that Moore's law hasn't been keeping up with bcrypt very well, largely because of the memory access patterns, so it's hard to predict what bcrypt cracking speeds will be like in the future.
Ok, a couple more notes on this thread...
Please don't use entropy as a way to measure password strength. It is very misleading and will give you a false sense of security. "SecretPassword123456" is 20 characters long and has 89.2 bits of entropy, which on paper would take eleventy-billion millinia to crack, but in reality it would fall to a dictionary attack. Never underestimate the stupidity of a user when asked to choose a password.
Also, I can't read python, but it looks like you might be using bcrypt to derive a password for a PGP. Is this correct? Maybe a better question is, can you explain how you are using bcrypt in your application? That way I can provide further guidance for how to better use it.
Third, this benchmark was done on a GPU cluster, and bcrypt is not GPU-friendly due to its pseudorandom memory access patterns. If you want real bcrypt benchmarks, it's best to do them on a high-end CPU.
Do you know of any good benchmarks with this sort of setup? The closest thing I could find was the oclHashcat-plus benchmarks, but they don't specify the logarithmic cost setting so it's not useful for comparison.
Please don't use entropy as a way to measure password strength. It is very misleading and will give you a false sense of security.
Good point. We tend to conflate the two because our application doesn't allow users to pick their own passphrases. Instead the application generates "codenames", which are really Diceware passphrases, using a CSPRNG (/dev/urandom). Our wordlist has 6969 words and we default to an 8-word codename, so we measure the strength of this passphrase in terms of it's entropy: log2(6969^8/2)
. This is the first part of the calculations in (amended) top comment.
This design choice might seem strange. We want to avoid allowing users to choose their own password, because they are typically low entropy. An 8-word diceware passphrase might seem cumbersome, but it is at least potentially memorizable (compared to an equivalent random alphanumeric passphrase), and it is much easier to type and check for typos.
Example 8-word Diceware passphrase: pierce clam messy sine torah sinew pont nehru
Entropy: log2(6969^8 / 2) = 101.13
bits
Example 17-character random alphanumeric (upper and lowercase) passphrase: A7eIqT8Sp86TrsOUQ
Entropy: log2(62^17 / 2) = 100.22
bits
So, given these setup, I think we can actually use entropy as a measure of "codename"/passphrase strength. If this analysis is flawed, please let us know!
Will answer your question about how we are using bcrypt when I have time later. Thanks for offering guidance! Always appreciated.
oclHashcat is GPU, so those benchmarks aren't really what you want as they won't be any different from my benchmarks, which were also done on GPU using oclHashcat. But just for the sake of completeness, the benchmarks on that page were all done with a cost setting of 5. As stated above, we always benchmark bcrypt with a cost setting of 5 for historical reasons.
There is a good collection of bcrypt benchmarks at http://openwall.info/wiki/john/benchmarks, including OpenMP and MPI benchmarks, so that should help give you a good idea of what a large CPU cluster can do. Remember that bcrypt is faster on CPU than GPU. You might also look at what Openwall is currently doing for their GSoC project, implementing bcrypt on FPGA and the Parallella board.
Your use of entropy here is fine then, however in the password cracking world, we prefer to express things in terms of keyspace rather than bits of entropy, so I would just say you have a keyspace of 6969^8. With a keyspace this large though, you really needn't worry about which bcrypt cost you choose, because anything will be just fine. You could hash a password like this using MD5 and you'd still never crack it. Even at 1 TH/s it would still take 176306784473 years to exhaust the keyspace.
Which reminds me, your maths above where you estimate the time it would take to crack a password are incorrect. time = keyspace / speed. So at 71 KH/s you'd have 6969^8 / 71000 = 78361973956459293701251166.499169 seconds divided by 31556926 seconds in a year =~ 2483194147505346170 years.
Oh, never mind, I see what you did there. I read "*" as "", I'm not familiar with that notation.
@epixoip, thank you tons for your input on this thread! It's really helping us get through the blocker on this and possibly some other issues.
To briefly answer your question about our use of bcrypt, which I suspect Garrett is about to answer anyway: besides using it to hash source passwords (codenames) before storing on the server, we also use bcrypt to stretch the source password into a passphrase for the source's GPG secret key, currently stored on the same server. Our reasoning was that because GPG's default passphrase stretching parameters (SHA-1 65,536 times) are weaker than bcrypt, it's a faster oracle for a dictionary attack by someone who has both the encrypted GPG private keys and the bcrypt-hashed source passwords.
We use two different salts for the two separate uses of bcrypt and check for salt-collision. These two salts are the same for all users.
Issue #171 has a bit more discussion about this.
Iterated SHA1 is not intrinsically weaker than bcrypt. You can tune them to be equally expensive, or tune one to be more expensive than the other. I believe GPG can be configured to use as many as 65,011,712 iterations of SHA2, so it can scale rather well against bcrypt.
The only drawback to iterated SHA vs bcrypt is that SHA algorithms are register-based, and thus easier to accelerate than bcrypt. But with hundreds of thousands of iterations, or even tens of millions, I don't think that's much of a concern.
Keep in mind that the point of key stretching is to derive a high-entropy key from a low-entropy string. But you already have a high entropy string. The type of passwords you are generating have an extremely low probability of being cracked (more on order of impossible) even if hashed as raw MD4. So, you know, law of diminishing returns and such.
If it were me, I would do some performance testing on your server to determine the optimal bcrypt cost for your environment. Then I would set the number of GPG kdf iterations to a value that meets or exceeds the computational time for bcrypt at whichever cost you selected. And then, if you still have concerns about security after that, consider simply adding another word to the generated password rather than further increasing the cost or iteration count. This would be the most sane approach.
But if you are dead-set on doing it the way you're doing it, then I would suggest using scrypt instead of bcrypt. This way you're really making it challenging for a TLA to accelerate it.
One more thing, you mention that all users share the same two salts. This is very concerning, as what you have aren't really salts, they're just shared secrets. So you have actually defeated the purpose of the salt. You absolutely must have a unique salt for each user. Actually in your case, you need two unique salts per user.
One more thing, you mention that all users share the same two salts. This is very concerning, as what you have aren't really salts, they're just shared secrets. So you have actually defeated the purpose of the salt.
This is a part of the original design, and it has always troubled me. I think the issue is confused because py-bcrypt takes the cost factor as an argument to its gensalt
function, so they seem related (when really, gensalt just uses it to return the first part of the "bcrypt string", $2a$cost_factor$salt
). Really, the original design doesn't use (or defeats the purpose of using by sharing a salt across all users) salts at all.
The reason for this is the codenames (Diceware passphrases) are used for both identification and authentication. That is, a source only has to remember the codename to log in again - it is both a username and a passphrase.
We have to use the same salt with this design because otherwise the only way to log a user in is trial hashing. This is slow (especially with these super-slow hash functions), does not scale well with the number of users (on average you have to try n/2 hashes before you can log a user in), and creates a trivial DoS vector.
I believe the aim of this design is to frustrate investigators seeking to identify sources that are able to gain access to the contents of the server at a point in time (but cannot perform an active attack on the running server), e.g. a server seizure. Since we only store the hashes of codenames on disk, there's not much to go on. If we stored a plaintext secret that only the source also knows, like a username, that then would give investigators something to look for in investigating potential leakers.
One could argue that a string of 8 random words stands out, and of course if investigators found that written somewhere, they could hash it and compare to the hashes on the server. But it is much easier if there is a specific identifying string.
The problem with not using unique salts is that a brute-force password search is guaranteed to find all user's codenames. If we used unique salts, then each codename would require it's own brute force search. This means the resistance of any one user's codename to brute force decreases linearly with the number of users of the system. This is not a good situation. but I think the general assumption has been that the number of users of a "leak site" would be relatively small, while the search space of the code names is still gigantic.
For example, with the "trillion clusters" example above, if the system had 100,000 users, it would only take 12.41 years to crack one of those user's codenames:
>>> (6969**8 / 2.0) / (71000*1000000000000) / 60 / 60 / 24 / 365.25 / 100000
12.415705560064655
This is a safe assumption but I don't like it.
But if you are dead-set on doing it the way you're doing it, then I would suggest using scrypt instead of bcrypt. This way you're really making it challenging for a TLA to accelerate it.
You're dividing by 100,000 in the wrong place. With 100k unique salts, your 71 KH/s total speed becomes an effective speed of 0.71 H/s since each plaintext candidate has to be re-hashed with each unique salt.
So it is actually:
>>> (6969**8 / 2.0) / ((71000 / 100000.0) * 1000000000000) / 60 / 60 / 24 / 365.25
124157055600.64658
And this is the point of the salt.
scrypt has been sufficiently reviewed. It was also authored by a fellow member of the Password Hashing Competition expert panel. The real main argument against scrypt at the moment is that it's difficult to implement and use. However, I think someone wrote a python module for it, so that should take a lot of the guesswork out of it.
You're dividing by 100,000 in the wrong place
I was trying to compute the expected time of any one user's hash being cracked, given that there are 100,000 users that each have their own unique hash but all share the same salt (the current design). Is the math not right for that?
Oh, I see. Hmm. Then I think the math for that would be:
>>> (6969**8 / 100000) / (71000*1000000000000) / 60 / 60 / 24 / 365.25
24.829568788501028
But I could be wrong. I'm not that great at math.
That's another way to do the same thing - you also need to divide the keyspace by 2, since on average you'll find the passphrase after half the possible choices. If I do that, then I get the same answer :smile:
Hm, no, I don't think so, because dividing by 100000 already gets you the average. I don't think you'd then divide the average by 2 to find the average of the average, would you? Fuck it, maybe you would. I'm terrible at math :smile:
Anyway, I think you are agreeing that you need unique salts per user, which is the important part.
As I see it without the division your outcome is not the time to crack a hash, but time to completely exhaust the keyspace ( 6969⁸ )? A given hash could fall anywhere between start to finish so dividing by 2 would give an average although a very broad average time to crack.
No, that's not the time to exhaust the keyspace. You'd be off by a factor of 100k for that.
Anyway, it doesn't really matter. This number is based on GPU cluster speeds, multiplied by an arbitrary 'one trillion clusters,' and not really representative of what your actual adversaries will be using. But the point is, you have to have a unique salt per user. If your model doesn't allow it, then I would strongly consider changing the model. Maybe I don't fully understand how your application works, but a salt is fundamental. Using the same salt for every hash, a nation state like the adversary you are describing could conceivably pre-compute passphrases. Plus you get none of the benefits of slowing down the cracking of multiple hashes.
While you guys have very good intentions, and I love that you are putting so much thought into this, I think the way you are going about it now is way overkill, while at the same time neglecting some of the basics. My advice would be to slow down, get back to basics, and take a very deliberate approach to solving this problem.
If it were me, this would be my approach:
First, I would figure out how many total and simultaneous users I was expecting and willing to support. Then I would do some benchmarking at different cost settings to determine how bcrypt performs on my actual hardware. I would then use these figures to determine the highest practical bcrypt cost setting I can use.
Next, I would benchmark GPG on my actual hardware to determine which iteration count is equivalent to the bcrypt cost setting I previously selected. I'd probably increase this value by 15% or so, maybe a bit more, just to be on the safe side.
Once you have the appropriate cost and iteration count values, I'd implement it in the app, ensuring each user has a unique salt. I'd only need one salt, because I'm only salting the bcrypt hash, and not the GPG KDF. And I'd have some way of mapping users to bcrypt hashes. Even if it's just a generic random number as a uid or something. If I really couldn't use a random salt, then I'd really have to weigh the risks associated with that.
To get a somewhat accurate estimate of how long it would take to crack the bcrypt hash instead of just arbitrarily saying "last year's GPU cluster speed times a trillion clusters," I'd look at the bcrypt numbers in the Openwall benchmark page linked above to get an idea of how bcrypt performs on a high-end CPU, then I'd then use current supercomputer core counts to get an idea of just how large a large CPU cluster is. I'd then use these figures to figure out the theoretical bcrypt performance of a cluster that size. I'd then multiply this value by 200 to get a rough estimate of what performance might be like in 50 years (keeping in mind that Moore's Law is projected to begin tapering off this year.)
For example, latest highest-end Intel CPU (4770K) can pull 1600 H/s per core at a cost setting of 5, Tianhe-2 supercomputer has 3120000 CPU cores. 1600 H/s * 3120000 cores = 4.99 GH/s * 200 for Moore's Law = 998 GH/s, or about 1 TH/s in 50 years. Divide this number in half for each number you go above 5. So if you use a cost of 12, 1 TH/s divided by 2^7 = 7.8 GH/s in 50 years for a cost of 12. Approximately. Now you can plug this value into the standard t=k/s formula to get a more realistic and informed idea of what a large supercomputer will be able to with bcrypt in 50 years. So (6969^8/2) / 7800000000 / 31556926 = 11301716953389.7 years. Approximately.
If you determine that this is unacceptable, you know you can't increase the iteration count because you're already using the highest viable cost setting. So the solution is to add another word to your generated passphrase to increase the keyspace.
This is just my advice. It's your project, and of course you are free to do whatever you feel is best. But I think at least there's enough information in this thread now for you guys to make a well-informed decision on how you wish to proceed.
Best of luck to you!
Thanks for that @epixoip thats certainly helped give some solid direction. Great stuff.
@epixoip Fantastic advice! We will take this under consideration immediately. I really appreciate you taking the time to contribute a helpful framework for evaluating a difficult question (defending against brute force attacks by LEA's with unknown capabilities). Thanks again! :+1:
Glad I could be of service. Please do not hesitate to contact me directly if you have any further questions or concerns.
I just read through this thread. @garrettr @diracdeltas: please check me to make sure I understand.
As I see it the main threat we are worried about is seizure of SecureDrop computers. In particular, suppose the SD servers and SVS are seized, and the attacker realizes that the activity associated with some bcrypt hash '$2a$12$MGlYAEY6QXL3s26aSJA60OLQHKgp1092m2ptRac9u3Y0nTq0LcRky' is somehow noteworthy. As I understand it, even if the attacker could associate some code name with this hash, that still wouldn't provide any identifying information about the source, since it was a randomly generated code name. Instead, the fear is that the attacker also breaks into a suspected source's home and obtains candidate code names, e.g. looks through the trash and finds a piece of paper with 8 random words scribbled on it. Then, the threat is that the attacker will be able to link some candidate code name to this hash. For example an MD5 hash would trivially have this property.
Given this threat, I think the main defense is a pepper, a global salt that exists in memory but that hopefully isn't recoverable by the attacker, which we have. Per-user salts add protection against rainbow tables but since the code words are randomly selected from a large key space, this shouldn't really matter. (Caveat: I haven't looked at all the arithmetic you have been doing in the thread, but conceptually we should be able to get the code names right.) Still, for good measure, we could add a trivial built-in per-user salt in addition to the pepper
bcrypt.hashpw("password" + pepper, bcrypt.gensalt()) '$2a$12$LEgdwWjpwsG12tNVAgZlbe/4dPJAvyY2FZo3xPKBW9AL7Iou8iJiC'
This wouldn't require a design change and would still add per-user salts. This might be good in case for the future we allow shorter user-generated code names. But the main protection I think is the pepper, which should ideally only exist in memory and be backed up on a separate non-seizable machine.
On second thought, after looking at the code, I don't think using a per-user salt makes sense. We are using the deterministic bcrypt hash as a unique identifier. This isn't a password, but an obfuscated username. If we use a per-user salt, we need some other sense of a username. But there is no advantage to having a password in this context in addition to the username.
I could see an argument for a redesign that involves letting users sign in with (insecure) usernames/passwords that they create, for familiarity with normal web services. In this case, we would want to use bcrypt the "right" way. But I think that is a separate conversation, and AFAIK such a redesign is not being actively considered.
I think I agree with @dtauerbach. The problem with our design right now is that the pepper isn't secret at all, since we're passing it in as the salt parameter in shash = bcrypt.hashpw(codename, salt)
. shash
starts with the salt parameter (which is also why it's possible to generate per-user salts in our current design without implementing usernames, as Dan has suggested).
@garrettr @diracdeltas and I discussed this a bit. We agreed that storing the config.py file only in memory to defend against seizure provides similar protection to encrypting the file system. In both cases, there needs to be some sort of secret that people could be compelled to turn over to restore the relevant information, and in both cases the secret could be easily destroyed.
@dolanjs is there full disk encryption of the hard drives of the servers?
If so, then this seems to mitigate the need to store this information only in memory. We should advise people to immediately destroy any private key material (including secure, unmemorized passwords) used for disk encryption if a seizure were to occur, before any legal order to preserve such material can be communicated.
We agreed that storing the config.py file only in memory to defend against seizure provides similar protection to encrypting the file system.
Let me know if I have this correct? Assuming that someone in-house can get to the power switch in time to turn the server off for a length of time to prevent an attacker from capturing a running server with config.py running only in the memory...and to prevent the potential of a cold reboot memory seizure.
If those checks are put in place, then storing config.py in memory would work fine as an alternative to encrypting the file system, except in my experience, getting to the power switch in time is a lot easier said than done...
We should advise people to immediately destroy any private key material (including secure, unmemorized passwords) used for disk encryption if a seizure were to occur, before any legal order to preserve such material can be communicated.
@dtauerbach We likely will not be able to recommend this. Due to decisions by some district courts in recent years, the DOJ could argue that this would be obstruction of justice even without a legal order to preserve such material. So if that is factoring in to how you guys are making a decision here, we should adjust.
used for disk encryption if a seizure were to occur
Instructing people how to get around a state level adversary invasion is probably unlawful in any country. Considering the fact that almost all large adversaries are completely prepared for any such a onsite mitigation by their victims, ( their first preference is to capture server before it can be shut down ) advising in this manner at any rate, would not be helpful in the event of a visit from the grab teams.
If the adversary is unable to get what they want from the ram of a running server, or from a cold boot attack, then they will come at the holder of the keys. Better to have them stored in an encrypted container and leave it up to the journalist to hold her ground than make an attempt at destroying keys. Believe me its a lot easier to say than do.
The question then is which password/key storage app holds its best against state level adversaries, and that is the likely starting place for recommendations to protect keys and passwords as a general rule rather than 'in the case of...'
@dtauerbach Did not enable full disk encryption for the Monitor/App servers. The problem was we configured unattended-upgrades to auto install and when required reboot for security updates. If full disk encryption is enabled this turns into a manual process.
Should we close this? pepper has been implemented and we no longer use bcrypt.
@diracdeltas Yeah, I'm closing this because we implemented the per-org peppers and are now using scrypt instead of bcrypt (so the title and top comment are no longer relevant). However, this comment thread has a lot of helpful and interesting advice that we should use when considering the future re-architecture of SecureDrop.
We currently use the default number of rounds (edit: cost factor, see below comments) for bcrypt to hash the codenames - 12. We could easily make brute force attacks more difficult by increasing this value. Given our threat model, 12 might not be enough.
Thanks to John Adams for reporting this issue.