dropbox / zxcvbn

Low-Budget Password Strength Estimation
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/wheeler
MIT License
15.04k stars 935 forks source link

spaced passwords add too much entropy. #21

Open gabeio opened 11 years ago

gabeio commented 11 years ago

a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 entropy:417.915 crack_time(seconds):3.190943210320069e+121 crack_time(readable):centuries score:4

abcdefghijklmnopqrstuvwxyz0123456789 entropy:14.118 crack_time(seconds):0.889 crack_time(readable):instant score:0

a b c 1 2 3 entropy:59.299 crack_time(seconds):35452087835576.43 crack_time(readable):centuries score:4

entropy:3.807 crack_time(seconds):0.001 crack_time(readable):instant score:0

just adding a space between every letter does not make it all that much more secure...

brianhempel commented 10 years ago

An easy solution to this problem is to check both the original password and a de-interleaved password.

For "1a2b3c4d5e", check the entropy of both "1a2b3c4d5e" and "12345abcde" and return the lower result.

pyramids commented 10 years ago

To even see what you mean by "an easy solution," you'd have to specify how your suggested de-interleaving generalizes to arbitrary input. Only having the single extra rule to consider odd character positions first and even later does not work well since modifications of the pattern are likely at least as common as that alternating pattern itself. For example, rather than "a b c d e f [etc.]", "a b cd e f [etc.] " should be treated similarly.

To address the spacing issue, one might consider moving any character that occurs more often than a certain threshold (4 times?) to the end, but that is clearly not what you intended,given your later de-interleaving example.

brianhempel commented 10 years ago

Does it generalize well? Perhaps not, but it is certainly better than the current behavior.

A generalized solution would do some fancy partitioning while walking the password from left to right:

Remaining password: "a b cd e f"
Bucket 1 (current):
Bucket 2:
Switching String:

First character always goes in Bucket 1.

Remaining password: " b cd e f"
Bucket 1 (current): "a"
Bucket 2:
Switching String:

Compare entropy cost of switching buckets vs. putting " " in the current bucket and make a decision.

Remaining password: "b cd e f"
Bucket 1: "a"
Bucket 2 (current): " "
Switching String: 1

Compare entropy cost of switching buckets vs. putting "b" in the current bucket.

Remaining password: " cd e f"
Bucket 1 (current): "ab"
Bucket 2: " "
Switching String: 11

Compare entropy cost of switching buckets vs. putting " " in the current bucket.

Remaining password: "cd e f"
Bucket 1: "ab"
Bucket 2 (current): "  "
Switching String: 111

Compare entropy cost of switching buckets vs. putting "c" in the current bucket.

Remaining password: "d e f"
Bucket 1 (current): "abc"
Bucket 2: "  "
Switching String: 1111

Compare entropy cost of switching buckets vs. putting "d" in the current bucket.

Remaining password: " e f"
Bucket 1 (current): "abcd"
Bucket 2: "  "
Switching String: 11110

Compare entropy cost of switching buckets vs. putting " " in the current bucket.

Remaining password: "e f"
Bucket 1: "abcd"
Bucket 2 (current): "   "
Switching String: 111101

Compare entropy cost of switching buckets vs. putting "e" in the current bucket.

Remaining password: " f"
Bucket 1 (current): "abcde"
Bucket 2: "   "
Switching String: 1111011

Compare entropy cost of switching buckets vs. putting " " in the current bucket.

Remaining password: "f"
Bucket 1: "abcde"
Bucket 2 (current): "    "
Switching String: 11110111

Compare entropy cost of switching buckets vs. putting "f" in the current bucket.

Remaining password: ""
Bucket 1 (current): "abcdef"
Bucket 2: "    "
Switching String: 111101111

The final password entropy is the entropy of "abcdef" in bucket 1, plus the entropy of " " in bucket 2, plus the entropy of the switching string "111101111".

To actually accurately predict which bucket is going to produce the lowest entropy, you probably have to do an exhaustive look-ahead search, kind of like chess computers thinking a few moves ahead. Happily, the number of possibilities to check is only 2^(number of moves ahead), so you could easily exhaustively search all the possible partitions for the next 6-8 characters and find the best one.

pyramids commented 10 years ago

Sounds like a definite no-go to me, as the >100x overhead of always doing a 7-move lookahead would be bringing the performance close to unbearable even on modern desktop browsers (which aren't the only devices worth supporting).

MrSchism commented 10 years ago

Just thinking out-loud, but...

Perhaps, as an alternative to the proposed solution to the initial issue (ignoring, for now, the problems with interleaved strings) would be to check the submitted password without spaces, then count the number of spaces and apply a value to spaces.

gabeio commented 10 years ago

I agree that's kinda what I was thinking a year ago when I put this as an issue

MrSchism commented 10 years ago

I think we have to look at common delimiters (" ", "_", and "-") and interleaved characters as separate issues (despite spaces being characters) due to human nature and the consistency of spaces.

That would also mean checking for underscores and dashes. It's doable and covers the majority of string delimiters; we're not as likely to see instances like "a1b1c1d1e" as we are to see "a-b-c-d-e".

gabeio commented 10 years ago

agreed definitely remove {"~",",","."," ","-","_"} repeated more than 1 time

Terebi42 commented 9 years ago

correct horse battery staple has entropy of 62.86 correcthorsebatterystaple has 45.

Since diceware and makemeapassword both make spaces the default now, and this is likely the most common "secure" pattern, the wordwordwordword guess should assume spaces and not grant them any entropy.

hickford commented 8 years ago
> zxcvbn('passwordqwerty').guesses_log10
4.176091259055681
> zxcvbn('password qwerty').guesses_log10
8.000715995361274
lowe commented 8 years ago

yup, this is on the list of things to fix. if you have suggestions, let me know.

one idea is to run zxcvbn several times according to the following substitutions, taking the minimum, and add an extra bit or two for the choice of delimiter:

original,
original, s/ //g
original, s/-//g
original, s/_//g
original, s/,//g
original, s/~//g

this grows the runtime 6x in the worst case, and isn't a general solution. but it has the advantage of simplicity, and zxcvbn is currently very fast. it also wouldn't be 6x in the typical case -- it could be coded to only check a substitution when the delimiter char appears in the string.

another idea would be to use a regex to try to detect the delimiter automatically, instead of having a hardcoded list. something along the lines of: http://stackoverflow.com/questions/16333681/regex-repeated-words-on-the-same-line

open to more ideas.