Feature: Pwnd Passwords Support

yamikuronue commented 6 years ago

Integrate with Pwnd Passwords.

Bluenaxela commented 6 years ago

If you mean using the API, I will note that it's not all positive for security to use it. It's a tradeoff. On one hand it prevents common or publicly leaked passwords from being used, but has the downside that it exposes 20 bits (about 3-4 characters) worth of password entropy to a third party. If one assumes the third party to be compromised/untrusted, this potentially makes a password which is not found to be vulnerable about 1048576 times faster to brute-force crack if the account it corresponded to was also known or easily guessable from account creation timing. It may be that this tradeoff could be considered worthwhile... but... 20 bits is a lot of lost entropy if one assumes the worst in terms of the worst case of a breach of the third party. Personally I think the tradeoff is most likely a win in the case of most users, but disclosing 20 bits of password entropy to a third party is a big enough deal that I tend to think it may be ethically questionable to use it without giving notice to users, but accurately conveying such notice in a way that gives a fair representation of the tradeoff is... not exactly user friendly... though maybe it's good enough to have a tiny "Passwords checked with Pwnd Passwords" little text thing with link to the site for more information.

The other way to use it, is by storing a copy of the database locally. In this case, there is (almost) only upside to the security, no downside.... except that the database is huge, taking quite a chunk of disk space wherever it's installed.

AccaliaDeElementia commented 6 years ago

@Bluenaxela

does the first five characters of the SHA1 hash really leak that much information?

https://haveibeenpwned.com/API/v2#SearchingPwnedPasswordsByRange

As i understand it it should be theoretically "impossible" to get from the first five characters of the SHA1 hash to the password that was hashed, and if a third party knew the first six characters they would at best be able to eliminate 600ish of the brazilliand and seven different passwords that would hash to that prefix.

Or am i misunderstanding where the entropy leak is coming from?

Bluenaxela commented 6 years ago

@AccaliaDeElementia

5 characters of the (hex encoded) SHA1 hash is by definition 20 bits of entropy and is what I'm referring to.

Directly getting the password back from the first five characters of the SHA1 hash is not directly feasible no, but your estimate of how it impacts brute force guessing is incorrect. It doesn't eliminate "600ish of the brazilliand and seven". It eliminates literally ~99.9999% of possible password guesses, or to put another way, eliminates in the ballpark of 999999 out of every 1000000 passwords. A brute forcing system could check the SHA1 of possible guesses in no time flat, before attempting to use the comparatively very very few remaining guesses to do an actual login to the site.

If a password was weak enough that Eve or Mallory could guess it in an average of ten million attempted logins, well now they could guess it in an average of just 10 logins if they had those five characters from the hash.

Basically, a potential factor of one million reduction in brute forcing time is nothing to sneeze at, and it's particularly a danger in the case of users with weak passwords that barely squeak by filters that try to avoid letting a user set weak passwords.

A truly good password should have enough margin in it's entropy that it'd likely remain decently strong in the face of such leaked information, but one can't always assume users will have such truly good passwords in spite of one's best efforts to nudge them in the right direction. Human creativity, the way most normal non-technical end users come up with passwords, tends to strongly optimize for the easiest to remember thing that passes whatever filters prevent them from setting a password, and the easiest to remember things also have the least entropy margin keeping them safe. There's often simply not much entropy to spare before things start getting dicey.

AccaliaDeElementia commented 6 years ago

hmm....

I can see where you are coming from, but i feel the 20 bits of leaked entropy are worth it to remove passwords that are already breached. it's not a decision to be done lightly, but if properly implemented it should be difficult to associate those leaked 20 bits (should they become leaked, which i think is not likely) with an account in a manner that could be used to facilitate an attack.

Hows about this for a passphrase policy?

No passphrase at time of setting shall appear as breached by HaveIBeenPwned API (no really really terrible passwords) (this will be indicated in the UI)
No passphrase shall be shorter than 8 characters (NIST Requirement)
Passphrase lengths of up to 256 characters are allowed (4x NIST Recommendation)
Passphrase less than 16 characters will be allowed to be set but will trigger a pre-submit informational messaging suggesting that a longer password such as correct horse battery staple would be more secure without being harder to remember
passphrase will never be referred to as password to prevent suggesting that the passphrase should be a word (or indeed, short)
When authenticating three failed login attempts from the same IP address in a row or more than 15 failed login attempts in 5 minutes will result in a 10 minute lockout. such lockout will reset with every further authentication attempt (without checking credentials) until manually reset by an admin or the full lockout period has expired without an authentication attempt (this will prevent a single IP from doing a brute force and severely hamper a botnet hopefully without impacting users much at all) (right)

I mean i'm whipping that out of my tailhole so they'll probably need tweaking. Please let me know your feelings on that. I'm NOT a security background sort of person, so this feedback is valuable.

Bluenaxela commented 6 years ago

@AccaliaDeElementia I agree that in the balance, it is likely worth it, but I do think the (admittedly small) possibility of a kinda significant downside (which could have impact outside the context of the site, due to bad habits of re-use) makes it ethically important to indicate it in the UI.

It may also be a good idea to give an option to a server admin to use a local copy of the pwned passwords database.

Regarding other aspects of your suggested policy:

The xkcd correct horse battery staple thing is flawed advice for coming up with passwords. Using a few words would be fine if the source of the words was true randomness with no human intervention, but the way that is usually interpreted is people just picking words at a whim. Humans are not good sources of randomness. People rarely appreciate just how non-random their "random" picks they make are. The words that humans would pick are heavily biased statistically speaking, and multiple words would also have a strong statistical interdependence when picked in sequence. Even a human choice to reroll a properly random phrase generator, can introduce more bias than is ideal.
As far as implications, I personally dislike the "passphrase" just as much as "password", due to the former it implying that using a phrase is desirable, and it tends to make people think of human-chosen phrases, which are not so great (see above). Besides, "password" is more widely understood as a term.
It may be worth considering incorporating zxcvbn. If you want to mess around with it, check this demo here. It's not perfect, because it does not take into account statistical interdependence of different human-chosen words (for dramatic examples, it hugely overestimates the strength of "aliceinwonderland", "greetingsandsalutations" and "baconlettucetomato", but it also overestimates in much less obvious cases too), but it is the most comprehensive well-considered estimator of password strength I've seen out there. It's biggest weakness could maybe be combated by cautioning that "If one must be selecting words (note, random selection is vastly preferable), avoid using titles, collections of related words, idioms, sayings, memes, intentionally grammatically correct phrases, or any phrase one imagines anyone has ever said/written before. The password strength estimator may overestimate the strength of these things. In addition, avoid incorporating information of personal significance or other structure, as that can also weaken a password compared to the estimated strength."

Bluenaxela commented 6 years ago

@AccaliaDeElementia Oh, and also... regarding the lockout for incorrect attempts, I think that's generally a good idea , but it may be worth giving thought to reducing the risk of it becoming a denial of service against a user. One option is to have a a staged soft lockout. Consider the following concept:

After two failed attempts regardless of IP, within a 1 hour period, passing an "I am not a robot" check is required for all further attempts. Naturally, further attempts won't count towards any threshold unless that check is passed.
After a per-IP or per-session-cookie threshold of ten attempts is hit, disallow further attempts from both that IP and that session cookie holder for 10 minutes.
After a global threshold of fifteen attempts is hit, disallow further attempts from any IP except an IP that user successfully logged in from recently, for 30 minutes.
After a global threshold of thirty attempts is hit, disallow further attempts from any IP for 30 minutes.

The above rules may not be perfect and may need further refinement... but they are deliberately designed with progressively more aggressive measures, to stop someone before it has to get so aggressive the legitimate user gets locked out or is otherwise inconvenienced.

Stage 1 blocks non-captcha-bypassing bots, stage 2 discourages (lazy) malicious humans, stage 3 steps things up with a auto-IP-whitelist that tries to avoid locking out the real user, stage 4 is the last-resort stage.

SockDrawer / SockRPG

Feature: Pwnd Passwords Support #103