salts for client should be generated deterministically server-side (instead of email+password)

defuse commented 6 years ago

A helpful person emailed me this:

I would like to add a comment with regards to client-side salting. It might be a bit misleading and sometimes completely wrong to use the username/email + domain as part of the salt.
Reasoning:
a) requesting the salt from server is debatable. A good implementation will always send a salt even if it does not exist in the database using a deterministic salt derivation function (similarly to key derivation functions).
b) both username and domain info might be very small values adding very low entropy to the salt part. Eg imagine your username is bob and your website is by.com
Then, your salted password becomes bobby.comPassWord, which unfortunately might be a word in the common rainbow tables. In my opinion, if you ever use that, it should always be Hash(Hash(salt) + password)
c) Some websites allow changing your username and/or email address. Then, on each change an extra protocol is required to update the password.

It's a slight benefit and it does add risk that the server-side code is buggy and generating bad salts.

MatthewStephens1 commented 5 years ago

Yeah someone has got in somehow and I can get all the right info but no way to use it????

polarathene commented 3 years ago

This is a nice example of a weakness with that simple client-side salt style that is often mentioned, thanks :+1:

I'd say that the length still makes it unlikely to be pre-computed, especially when for most domains the username + domain name isn't likely to form a potential dictionary word (which we're also arguing here has a .com treated as a word, and with PassWord using pattern variations with the casing there).

Bob should probably not be surprised if his password did get compromised, but such an attack is more likely to be carried out with the hashed DB password already available to the attacker. The salt being predictable is a given and an attack against Bob would quickly reveal his password when iterating through common poor passwords which is low effort.

What additionally protects bob here is a slow hash being in the mix. The same supposed rainbow table with this result needs to match the slow hash algo chosen with the same parameters. You can add in a pepper that's randomly generated once for your service and accessed via ENV var afaik, mixing this in server-side to get the DB hash lookup makes the issue moot? (unless the attacker has access to that as well, but for whatever reason does not have enough access to the server to perform an active attack by modifying JS payload to just send passwords or whatever they want to accomplish)

If it's still worrying you, you could do as suggested and hash the salt. Only useful when it's not a common pattern, an attacker making these tables could just as easily generate username + domain + password as components. In that same sense, you can add jibberish into the salt, doesn't matter if public and static, it'd just be padding the length against the attack you're concerned about.

~~Rainbow tables don't do well against storage requirements~~ (EDIT: Rainbow tables can be more storage efficient, I mixed them up with plain lookup tables, storage still gets fairly demanding for the most part though), 4 billion hashes (2^32) is at least 137GB? 2^55 * 32B= >1 EB 1 ExaByte, 1 million TeraBytes, that's 8,388,608 times (2^23) more hashes. EFF has a short dictionary list of 1296 words (6^4, roll 4 six sided dice to get the index of a word in the list), 5 of those words is log2((6^4)^5) = ~51.7 (51.7 bits / 2^51.7), 117 PB (117,000 TeraBytes). Their larger wordlist is 7,776 in size and can achieve the same size in just 4 words log2((6^5)^4)= ~51.7. These are without transforming each word into different variations (changing case, substituting letters with numbers, or delimiters between words like spaces, dots or dashes if any).

Just let that sink in, a ~~rainbow table~~ lookup table needs to prioritize password combinations that are going to be statistically more likely to be successful matches, otherwise as the length of the input pattern/rules increases, the size of the table gets very expensive and less likely to be commonly available (probably behind a pay wall or private community server/cluster). Someone might specifically generate a table for such a pattern though if you had 1 million bobby.com like "words" combined with 1 million common passwords, you'd get log2((2^20) * (2^20)) = 40 which is 2^40 * 32B = ~35TB. If you were able to process all those passwords through a slow hash at a rate of 1 million / sec (expensive), it'd take: ((2^40) / 1000000) / (60*60*24) = ~12.7 roughly 13 days (bcrypt work factor of 12 on an nvidia 3080 GPU only manages around 600 hashes per sec IIRC which'd make that about 58 years instead).

domain could be sourced from a list of Top X domains or popular words (all the same in a dictionary / pattern source for generating a rainbow table) coupled with a 2nd "word", the TLD (.com, .net, .org) or just .com to reduce the overhead. Lower effort is to just generate with common usernames that can be filtered in advance to pair well with a certain length of words as the domains (could even automate that somewhat for worthwhile domains), but effectively username + domain == word/name + .com shouldn't be that difficult of a rule to consider by whomever is creating such a rainbow table.

defuse / crackstation

salts for client should be generated deterministically server-side (instead of email+password) #5