Hashing Security - Timing attack section clarification

In hashing security page, there is this description of an attack (section: Why does the hashing code on this page compare the hashes in "length-constant" time?).

First, the attacker finds 256 strings whose hashes begin with every possible byte. The string that takes the longest will be the one whose hash's first byte matches the real hash's first byte.

Ok so, the attacker hashes inputs until they have 256 strings that have a unique byte for the target byte, as they iterate through the sequence. eg the third byte as target has 256 strings that compute to the first two bytes previously identified as part of the computed hash on the server being verified against, with each string having the third byte as unique.

This assumes that the system has a 1 second rate limit only as a precaution, but will happily allow unlimited failures.In a little over 2 hours you'd have the full hash for SHA-256, except for the fact that the time to pre-compute those 256 strings per target byte will keep rising to the point that wouldn't be practical.

Once the attacker knows enough of the hash, he can use his own hardware to crack it, without being rate limited by the system.

How exactly does that work if you don't have the full hash? How would you know what a successful match is or perform verification as you most certainly aren't going to have success against 1 second rate limiting unless the user has a very common password that you can try matching your known sequence of bytes.

It might seem like it would be impossible to run a timing attack over a network. However, it has been done, and has been shown to be practical.

I actually read this crackstation page and that linked SSL timing attack PDF about a month ago, I don't particularly want to go over it again but IIRC, that like many other attacks was under very specific conditions/assumptions that didn't make it all that practical in the real world.

In particular with that PDF it notes that the network was between two machines on their campus network, that's quite different to attacking a server very far away from you with multiple other factors to contribute to the network latency jitter. They also perform the attack against a very dated version of OpenSSL 0.9.7 (Dec 2002). Unclear when the paper was published but seems to be around 2003?

I'm not sure how this is meant to be feasible of an attack. Originally I was going to submit a correction to the section but have since realized I had a misunderstanding when I first read it.

This issue is just asking for clarification how the final part of that attack would be carried out successfully with only a partial hash and a rate-limited API endpoint to test against.

Yeah you're right, by the time you've extracted the entire hash that way you'd have to have found a preimage for the actual hash. It's still worth plugging the information leak, because for example if you had a wordlist, leaking the first 32 bytes of the hash (which should be feasible) would let you filter down the list to a small number of candidates. Then you could try those candidates on the live system. In other words a small amount of leakage would let you perform a dictionary attack against the on-line service much quicker.

The article should be updated to fix the mistake and explain this reason instead.

Thank you!

It's still worth plugging the information leak, because for example if you had a wordlist, leaking the first 32 bytes of the hash (which should be feasible) would let you filter down the list to a small number of candidates.

32 bytes is a full SHA-256 hash right? bcrypt is less at 24 bytes. Wouldn't you have the full hash usually by this point?

Overly verbose response

> - *He sends each string to the on-line system, recording the amount of time it takes the system to respond.* > - *The string that takes the longest will be the one whose hash's first byte matches the real hash's first byte.* > - *The attacker now knows the first byte, and can continue the attack in a similar manner on the second byte, then the third, and so on.* Assuming that each attempt to login has a reliable delta between the successful and unsuccessful byte matches (_which is probably quite small, the server especially if reasonably active and doing more than just logins; in addition to the network involved.. is going to add quite a bit of jitter. You'd realistically need much more samples than 1 per byte permutation_), this still seems impractical? I'm aware that the salt and hash type etc are known as the article states earlier to the quote, but the only way I see this having any likelihood of working is that you have chosen a wordlist that contains the password of the user being attacked.. that would defeat the need for this attack though. I say this because the inputs themselves for this attack only have relevance if they're generating the hash outputs with the byte sequences you're interested in, so their content can be random and is unimportant to you. The problem is getting those hash byte sequences. You're either looking these up from a wordlist you compute in advance (storage requirements), or doing the computation on the fly as you find a match for each byte (time will bottleneck you). Initially it's a non issue, find 256 strings for the 1st byte, when you know the 2nd byte, generate more strings until you have 256 permutations on the 2nd byte with the same 1st byte... it reads small but as the requirement for the byte sequence constraint gets more strict, you're taking much longer to produce each new input to test. 32 bytes, the full SHA-256 hash is such a large amount of permutations `2^256` (or `256^32` if you prefer) that it's near the estimated amount of atoms in the observable universe (approx `~10^80`, where `2^256` is `~10^77`, that's 77 zeroes!). The way you describe the attack it, it's easy to think about just the subset, at 31 bytes, all you need is 256 more and you're good right? But you can't use prior inputs to speed that up, you've got to keep generating/iterating inputs that match the initial 31 bytes, and then one of your 256 permutation bytes for the 32nd... that's massive! :exploding_head: 128-bit of entropy (16 bytes of your SHA-256 hash, all permutations) requires enough energy for this sort of attack that it'd be sufficient to boil all the oceans on Earth. 128-bits of entropy (`2^128`) is approx 3.4e38 (`3.4 * 10^38`), still a pretty big number :grimacing: --- One of us is misunderstanding something here, leaking 32 bytes of a hash this way does not seem feasible in the slightest due to the nature of hashing. Can you help point out where I've misunderstood? > would let you filter down the list to a small number of candidates. Then you could try those candidates on the live system. This seems to imply you're computing the candidates with proper byte sequence permutations up front which would take a while if you really meant 32 bytes (perhaps you meant the first 4?), but without knowing the sequence upfront the storage and time requirements to produce such a list is considerably higher.

TL;DR

I can understand how it helps with optimizing an online attack against the 1 second rate limit window:

With a very large wordlist of leaked user passwords, if the target is using one of these, you can better filter the list by only trying those that have the same initial byte sequence you've identified.
If you're generating for brute-force instead (perhaps with pattern or other attack strategies), you can generate many hashes offline and filter for only those matching the byte sequence.

At the same time, it doesn't make a practical difference if the entropy is high, especially with a slow hash:

Diceware passwords that have wordlists with each word matching a number from N dice rolls (eg 5 rolls of a 6 sided dice is 6^5 = 7776, 7776 words, one word for each permutation), 5 words from a 7776 diceware list randomly selected is 64 bits of entropy (log2(7776^5) = ~64). This is roughly 1.8e19 (2^64) to 2.8e19 (7776^5), the lower bound being: 18,000,000,000,000,000,000 (18 million trillion).
bcrpyt, argon2 or similar for producing the password hash may be tuned to take 50ms on a system for example, usually longer. Even with parallel computation on attack hardware, this slows down candidate generation considerably (bcrypt is even bottlenecked on GPUs due to local memory per core being too small).
Even if you knew the wordlist and that the password was exactly 5 of those random words, if you had the computing power to iterate through 1 billion (1e9) hashed permutations a second, it'd still take about 570 years ((1.8e19 / 1e9) / (60*60*24*365) = ~570), on average half that time.
The timing attack might be able to help optimize candidate selection, however it's unclear how it's going to be practical? (you can't do a lookup without the computation in advance, only try filter candidates as they're generated ideally finding at least 1 match per second)

My bad, I meant to say the first 32 bits! You're correct that if the password entropy is high, this attack won't let you extract the hash or the password.

Ah ok that is more feasible to compute. I think I've understood it? (detailed below)

Verbose response

If you're able to not only compute 1 million hashes each second and they're unique values for our 2^32 range (_which will get more difficult over time due to non-unique byte sequence, effectively collisions slowing this rate_), we'd manage computing this lookup in a little over an hour: `((2^32 / 1e6) / (60*60) = ~1.19)`. I don't know how to calculate the time offset from the overhead of filtering out collisions but it's presumably optimistic to average the rate out at 1,000 unique hashes/sec which shifts it to about 50 days. 24-bits is a little over 4 hours at this reduced rate and 16-bits a mere minute, that's something you could benchmark against for better metric I suppose? --- Regardless, the time requirement is still going to be problematic? At best we're optimizing by filtering candidates to try against the live validation rate limit. For reasons covered in my prior message, the partial hash knowledge isn't providing much of a practical benefit, unless the target has a password that fits within in a subset of candidates backed by statistics (eg password leaks/patterns) that you can lookup with your partial hash as a key? At least that's the only way I see you truncating the first 32-bits, as a lookup to an existing set (_good for low hanging fruit_), or filtering generated hashes so the rate limit isn't wasted (_but the chance of success is infeasible_). --- My math might be wrong below, not sure. At the 1 second rate limit there is roughly 16 bits of entropy (~65k attempts) covered in a single day, or 25 bits (~33 million) for a year of attempts. If the attacker can leverage [Kerckhoff's principle](https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle) against the target, then we know how soon we can get success especially if we know the subset of entropy that has been filtered by the partial hash from the timing attack? [Average entropy of user passwords is a little over 40 bits](https://en.wikipedia.org/wiki/Password_strength#Human-generated_passwords). For a set of values where 1 of them is the target password value unhashed; if we assumed all values mapped to uniform distribution of their hashed equivalents.. knowing the first 32 bits reduces down to <0.4% (`(1 / (2^40 / 2^32)) * 100 = ~0.39`) or 256 values to check. Still need to compute the 40-bit wordlist to hash in advance for lookup though, at 1 million per sec, that'll take almost 13 days (`((2^40 / 1e6) / (60*60*24) = ~12.7`). Then about 17 minutes to derive the 4 byte sequence for partial hash at the 1 second rate limit (`(256*4) / 60 = ~17`) and then <5 minutes if we only had 255 values left to test (`255 / 60 = ~4.3`). In contrast the first 4 bytes of the 32 bytes SHA-256 hash would be <3.7e-66% `(1 / (2^256 / 2^32)) * 100 = ~3.7e-66`) or about 1 in 2.7e67 (`2^224`). Not going to make a meaningful difference without a wordlist of much lower entropy.. For context an **nvidia RTX 3080 can do bcrypt approx 1,000 hashes/sec** at work factor 11 (_10-12 should be reasonable to expect on servers today_), which would take that above 13 days attack time (where the attacker has plenty of advantages due to Kerckhoff's principle) to almost **35 years** (`((2^40 / 1e3) / (60*60*24*365) = ~34.9`).

TL;DR

Ok, so this timing attack serves the purpose of matching the initial N byte sequence of the target hash to:

Identify a subset of values from a large wordlist that are actually worth trying. (successful if target password is one of these values)
Generate values to try on demand that don't waste rate limited attempts (feasibility of success requires target to use a password of low enough entropy to generate inputs for).

In either case, the attacker is avoiding wasted effort/time thanks to the partial hash. Feasibility still seems to depend heavily on Kerckhoff's principle to attack a subset of the keyspace if practical.

Some notes:

Approx <20 minutes to identify 4 byte partial hash with lookup table (Unknown time to compute this).
For generated hashes, each has a 1 in 4 billion chance of matching initial 4 bytes of target hash (that is, once the partial hash known).
An nvidia RTX 3080 can output 1,000 hashes/sec for bcrypt at work factor 11. If computing a wordlist of 1 billion values/permutations, it'll take 12 days ((2^30 / 1e3) / (60*60*24) = ~12) with that GPU and >35GB storage.
With a 1 second rate limit, 65k attempts can be tried daily, 33 million annually.
Average entropy of user passwords is a little over 40 bits. ~35 years to exhaust keyspace ((2^40 / 1e3) / (60*60*24*365) = ~35), partial hash ensures that each login attempt isn't wasted; average time 17 years (or about 6 days with 1,000 RTX 3080; assuming 2 million filtered attempts is sufficient: 2^40 / ((60*60*24)*6)= ~2e6). >35TB storage if needed.

defuse / crackstation