defuse / crackstation

Source code for my crackstation.net website.
https://crackstation.net/
GNU Affero General Public License v3.0
132 stars 39 forks source link

Hashing Security - Timing attack section clarification #13

Open polarathene opened 3 years ago

polarathene commented 3 years ago

In hashing security page, there is this description of an attack (section: Why does the hashing code on this page compare the hashes in "length-constant" time?).

First, the attacker finds 256 strings whose hashes begin with every possible byte. The string that takes the longest will be the one whose hash's first byte matches the real hash's first byte.

Ok so, the attacker hashes inputs until they have 256 strings that have a unique byte for the target byte, as they iterate through the sequence. eg the third byte as target has 256 strings that compute to the first two bytes previously identified as part of the computed hash on the server being verified against, with each string having the third byte as unique.

This assumes that the system has a 1 second rate limit only as a precaution, but will happily allow unlimited failures.In a little over 2 hours you'd have the full hash for SHA-256, except for the fact that the time to pre-compute those 256 strings per target byte will keep rising to the point that wouldn't be practical.

Once the attacker knows enough of the hash, he can use his own hardware to crack it, without being rate limited by the system.

How exactly does that work if you don't have the full hash? How would you know what a successful match is or perform verification as you most certainly aren't going to have success against 1 second rate limiting unless the user has a very common password that you can try matching your known sequence of bytes.

It might seem like it would be impossible to run a timing attack over a network. However, it has been done, and has been shown to be practical.

I actually read this crackstation page and that linked SSL timing attack PDF about a month ago, I don't particularly want to go over it again but IIRC, that like many other attacks was under very specific conditions/assumptions that didn't make it all that practical in the real world.

In particular with that PDF it notes that the network was between two machines on their campus network, that's quite different to attacking a server very far away from you with multiple other factors to contribute to the network latency jitter. They also perform the attack against a very dated version of OpenSSL 0.9.7 (Dec 2002). Unclear when the paper was published but seems to be around 2003?


I'm not sure how this is meant to be feasible of an attack. Originally I was going to submit a correction to the section but have since realized I had a misunderstanding when I first read it.

This issue is just asking for clarification how the final part of that attack would be carried out successfully with only a partial hash and a rate-limited API endpoint to test against.

defuse commented 3 years ago

Yeah you're right, by the time you've extracted the entire hash that way you'd have to have found a preimage for the actual hash. It's still worth plugging the information leak, because for example if you had a wordlist, leaking the first 32 bytes of the hash (which should be feasible) would let you filter down the list to a small number of candidates. Then you could try those candidates on the live system. In other words a small amount of leakage would let you perform a dictionary attack against the on-line service much quicker.

The article should be updated to fix the mistake and explain this reason instead.

Thank you!

polarathene commented 3 years ago

It's still worth plugging the information leak, because for example if you had a wordlist, leaking the first 32 bytes of the hash (which should be feasible) would let you filter down the list to a small number of candidates.

32 bytes is a full SHA-256 hash right? bcrypt is less at 24 bytes. Wouldn't you have the full hash usually by this point?

Overly verbose response > - *He sends each string to the on-line system, recording the amount of time it takes the system to respond.* > - *The string that takes the longest will be the one whose hash's first byte matches the real hash's first byte.* > - *The attacker now knows the first byte, and can continue the attack in a similar manner on the second byte, then the third, and so on.* Assuming that each attempt to login has a reliable delta between the successful and unsuccessful byte matches (_which is probably quite small, the server especially if reasonably active and doing more than just logins; in addition to the network involved.. is going to add quite a bit of jitter. You'd realistically need much more samples than 1 per byte permutation_), this still seems impractical? I'm aware that the salt and hash type etc are known as the article states earlier to the quote, but the only way I see this having any likelihood of working is that you have chosen a wordlist that contains the password of the user being attacked.. that would defeat the need for this attack though. I say this because the inputs themselves for this attack only have relevance if they're generating the hash outputs with the byte sequences you're interested in, so their content can be random and is unimportant to you. The problem is getting those hash byte sequences. You're either looking these up from a wordlist you compute in advance (storage requirements), or doing the computation on the fly as you find a match for each byte (time will bottleneck you). Initially it's a non issue, find 256 strings for the 1st byte, when you know the 2nd byte, generate more strings until you have 256 permutations on the 2nd byte with the same 1st byte... it reads small but as the requirement for the byte sequence constraint gets more strict, you're taking much longer to produce each new input to test. 32 bytes, the full SHA-256 hash is such a large amount of permutations `2^256` (or `256^32` if you prefer) that it's near the estimated amount of atoms in the observable universe (approx `~10^80`, where `2^256` is `~10^77`, that's 77 zeroes!). The way you describe the attack it, it's easy to think about just the subset, at 31 bytes, all you need is 256 more and you're good right? But you can't use prior inputs to speed that up, you've got to keep generating/iterating inputs that match the initial 31 bytes, and then one of your 256 permutation bytes for the 32nd... that's massive! :exploding_head: 128-bit of entropy (16 bytes of your SHA-256 hash, all permutations) requires enough energy for this sort of attack that it'd be sufficient to boil all the oceans on Earth. 128-bits of entropy (`2^128`) is approx 3.4e38 (`3.4 * 10^38`), still a pretty big number :grimacing: --- One of us is misunderstanding something here, leaking 32 bytes of a hash this way does not seem feasible in the slightest due to the nature of hashing. Can you help point out where I've misunderstood? > would let you filter down the list to a small number of candidates. Then you could try those candidates on the live system. This seems to imply you're computing the candidates with proper byte sequence permutations up front which would take a while if you really meant 32 bytes (perhaps you meant the first 4?), but without knowing the sequence upfront the storage and time requirements to produce such a list is considerably higher.

TL;DR

I can understand how it helps with optimizing an online attack against the 1 second rate limit window:

At the same time, it doesn't make a practical difference if the entropy is high, especially with a slow hash:

defuse commented 3 years ago

My bad, I meant to say the first 32 bits! You're correct that if the password entropy is high, this attack won't let you extract the hash or the password.

polarathene commented 3 years ago

Ah ok that is more feasible to compute. I think I've understood it? (detailed below)

Verbose response If you're able to not only compute 1 million hashes each second and they're unique values for our 2^32 range (_which will get more difficult over time due to non-unique byte sequence, effectively collisions slowing this rate_), we'd manage computing this lookup in a little over an hour: `((2^32 / 1e6) / (60*60) = ~1.19)`. I don't know how to calculate the time offset from the overhead of filtering out collisions but it's presumably optimistic to average the rate out at 1,000 unique hashes/sec which shifts it to about 50 days. 24-bits is a little over 4 hours at this reduced rate and 16-bits a mere minute, that's something you could benchmark against for better metric I suppose? --- Regardless, the time requirement is still going to be problematic? At best we're optimizing by filtering candidates to try against the live validation rate limit. For reasons covered in my prior message, the partial hash knowledge isn't providing much of a practical benefit, unless the target has a password that fits within in a subset of candidates backed by statistics (eg password leaks/patterns) that you can lookup with your partial hash as a key? At least that's the only way I see you truncating the first 32-bits, as a lookup to an existing set (_good for low hanging fruit_), or filtering generated hashes so the rate limit isn't wasted (_but the chance of success is infeasible_). --- My math might be wrong below, not sure. At the 1 second rate limit there is roughly 16 bits of entropy (~65k attempts) covered in a single day, or 25 bits (~33 million) for a year of attempts. If the attacker can leverage [Kerckhoff's principle](https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle) against the target, then we know how soon we can get success especially if we know the subset of entropy that has been filtered by the partial hash from the timing attack? [Average entropy of user passwords is a little over 40 bits](https://en.wikipedia.org/wiki/Password_strength#Human-generated_passwords). For a set of values where 1 of them is the target password value unhashed; if we assumed all values mapped to uniform distribution of their hashed equivalents.. knowing the first 32 bits reduces down to <0.4% (`(1 / (2^40 / 2^32)) * 100 = ~0.39`) or 256 values to check. Still need to compute the 40-bit wordlist to hash in advance for lookup though, at 1 million per sec, that'll take almost 13 days (`((2^40 / 1e6) / (60*60*24) = ~12.7`). Then about 17 minutes to derive the 4 byte sequence for partial hash at the 1 second rate limit (`(256*4) / 60 = ~17`) and then <5 minutes if we only had 255 values left to test (`255 / 60 = ~4.3`). In contrast the first 4 bytes of the 32 bytes SHA-256 hash would be <3.7e-66% `(1 / (2^256 / 2^32)) * 100 = ~3.7e-66`) or about 1 in 2.7e67 (`2^224`). Not going to make a meaningful difference without a wordlist of much lower entropy.. For context an **nvidia RTX 3080 can do bcrypt approx 1,000 hashes/sec** at work factor 11 (_10-12 should be reasonable to expect on servers today_), which would take that above 13 days attack time (where the attacker has plenty of advantages due to Kerckhoff's principle) to almost **35 years** (`((2^40 / 1e3) / (60*60*24*365) = ~34.9`).

TL;DR

Ok, so this timing attack serves the purpose of matching the initial N byte sequence of the target hash to:

In either case, the attacker is avoiding wasted effort/time thanks to the partial hash. Feasibility still seems to depend heavily on Kerckhoff's principle to attack a subset of the keyspace if practical.

Some notes: