Open gabeio opened 11 years ago
An easy solution to this problem is to check both the original password and a de-interleaved password.
For "1a2b3c4d5e", check the entropy of both "1a2b3c4d5e" and "12345abcde" and return the lower result.
To even see what you mean by "an easy solution," you'd have to specify how your suggested de-interleaving generalizes to arbitrary input. Only having the single extra rule to consider odd character positions first and even later does not work well since modifications of the pattern are likely at least as common as that alternating pattern itself. For example, rather than "a b c d e f [etc.]", "a b cd e f [etc.] " should be treated similarly.
To address the spacing issue, one might consider moving any character that occurs more often than a certain threshold (4 times?) to the end, but that is clearly not what you intended,given your later de-interleaving example.
Does it generalize well? Perhaps not, but it is certainly better than the current behavior.
A generalized solution would do some fancy partitioning while walking the password from left to right:
Remaining password: "a b cd e f"
Bucket 1 (current):
Bucket 2:
Switching String:
First character always goes in Bucket 1.
Remaining password: " b cd e f"
Bucket 1 (current): "a"
Bucket 2:
Switching String:
Compare entropy cost of switching buckets vs. putting " " in the current bucket and make a decision.
Remaining password: "b cd e f"
Bucket 1: "a"
Bucket 2 (current): " "
Switching String: 1
Compare entropy cost of switching buckets vs. putting "b" in the current bucket.
Remaining password: " cd e f"
Bucket 1 (current): "ab"
Bucket 2: " "
Switching String: 11
Compare entropy cost of switching buckets vs. putting " " in the current bucket.
Remaining password: "cd e f"
Bucket 1: "ab"
Bucket 2 (current): " "
Switching String: 111
Compare entropy cost of switching buckets vs. putting "c" in the current bucket.
Remaining password: "d e f"
Bucket 1 (current): "abc"
Bucket 2: " "
Switching String: 1111
Compare entropy cost of switching buckets vs. putting "d" in the current bucket.
Remaining password: " e f"
Bucket 1 (current): "abcd"
Bucket 2: " "
Switching String: 11110
Compare entropy cost of switching buckets vs. putting " " in the current bucket.
Remaining password: "e f"
Bucket 1: "abcd"
Bucket 2 (current): " "
Switching String: 111101
Compare entropy cost of switching buckets vs. putting "e" in the current bucket.
Remaining password: " f"
Bucket 1 (current): "abcde"
Bucket 2: " "
Switching String: 1111011
Compare entropy cost of switching buckets vs. putting " " in the current bucket.
Remaining password: "f"
Bucket 1: "abcde"
Bucket 2 (current): " "
Switching String: 11110111
Compare entropy cost of switching buckets vs. putting "f" in the current bucket.
Remaining password: ""
Bucket 1 (current): "abcdef"
Bucket 2: " "
Switching String: 111101111
The final password entropy is the entropy of "abcdef" in bucket 1, plus the entropy of " " in bucket 2, plus the entropy of the switching string "111101111".
To actually accurately predict which bucket is going to produce the lowest entropy, you probably have to do an exhaustive look-ahead search, kind of like chess computers thinking a few moves ahead. Happily, the number of possibilities to check is only 2^(number of moves ahead), so you could easily exhaustively search all the possible partitions for the next 6-8 characters and find the best one.
Sounds like a definite no-go to me, as the >100x overhead of always doing a 7-move lookahead would be bringing the performance close to unbearable even on modern desktop browsers (which aren't the only devices worth supporting).
Just thinking out-loud, but...
Perhaps, as an alternative to the proposed solution to the initial issue (ignoring, for now, the problems with interleaved strings) would be to check the submitted password without spaces, then count the number of spaces and apply a value to spaces.
I agree that's kinda what I was thinking a year ago when I put this as an issue
I think we have to look at common delimiters (" ", "_", and "-") and interleaved characters as separate issues (despite spaces being characters) due to human nature and the consistency of spaces.
That would also mean checking for underscores and dashes. It's doable and covers the majority of string delimiters; we're not as likely to see instances like "a1b1c1d1e" as we are to see "a-b-c-d-e".
agreed definitely remove {"~",",","."," ","-","_"} repeated more than 1 time
correct horse battery staple has entropy of 62.86 correcthorsebatterystaple has 45.
Since diceware and makemeapassword both make spaces the default now, and this is likely the most common "secure" pattern, the wordwordwordword guess should assume spaces and not grant them any entropy.
> zxcvbn('passwordqwerty').guesses_log10
4.176091259055681
> zxcvbn('password qwerty').guesses_log10
8.000715995361274
yup, this is on the list of things to fix. if you have suggestions, let me know.
one idea is to run zxcvbn several times according to the following substitutions, taking the minimum, and add an extra bit or two for the choice of delimiter:
original,
original, s/ //g
original, s/-//g
original, s/_//g
original, s/,//g
original, s/~//g
this grows the runtime 6x in the worst case, and isn't a general solution. but it has the advantage of simplicity, and zxcvbn is currently very fast. it also wouldn't be 6x in the typical case -- it could be coded to only check a substitution when the delimiter char appears in the string.
another idea would be to use a regex to try to detect the delimiter automatically, instead of having a hardcoded list. something along the lines of: http://stackoverflow.com/questions/16333681/regex-repeated-words-on-the-same-line
open to more ideas.
a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 entropy:417.915 crack_time(seconds):3.190943210320069e+121 crack_time(readable):centuries score:4
abcdefghijklmnopqrstuvwxyz0123456789 entropy:14.118 crack_time(seconds):0.889 crack_time(readable):instant score:0
a b c 1 2 3 entropy:59.299 crack_time(seconds):35452087835576.43 crack_time(readable):centuries score:4
entropy:3.807 crack_time(seconds):0.001 crack_time(readable):instant score:0
just adding a space between every letter does not make it all that much more secure...