Why do certain characters add more complexity than others?

CaelanStewart commented 8 years ago

Hi,

I posted an issue on the ProcessWire CMS Issues repo, thinking that it was their password strength checker at fault. But it turns out they use this library to measure password complexity.

I reference this issue.

So, why should a ^ add less complexity than an asterisk (*)?

It doesn't seem to make sense. Surely in a randomly generated string with uniform distribution, it shouldn't matter what character it is, just that you are using a complex range of characters.

My first guess, would be that the popularity of the characters has been considered also.

danpalmer commented 8 years ago

Thanks for your question, I hope I can explain why this is happening...

Complexify is based around the assumptions that:

brute force breaking of all passwords is not practical
therefore attacks try to minimise the amount of work they do, by minimising the character sets they include.

When performing dictionary attacks, attackers typically decide on their character sets based on certain logical groupings, i.e. you might include lowercase letters, or numbers. Complexify bases its calculation of the complexity of a password on the sum of sizes of each distinct character set that features in the password.

I believe the issue you've describe occurs because * and ^ appear in different logical groupings of characters, and that those sets have different sizes, thus contribute differently to the total character set needed to attack a password. This is expected behaviour.

Why are the two characters in different character sets though? That's a good question. It's a long time since I wrote this so I don't know for sure, but it looks like the grouping logic was fairly simple, so required contiguous groups of characters, and the two you gave are separated on the ASCII table (separated by the capital letters). In hindsight this is probably unnecessary, and it would make a little more sense if those were treated as the same character set. However, it would also give less granularity on the complexity, so I'm in two minds about it really.

If you'd like to submit a PR that fixes this please feel free, however I don't think this is obviously a bug, and since I don't use Complexify anywhere myself at the moment, I'm not going to prioritise fixing it for now.

CaelanStewart commented 8 years ago

@danpalmer, I thought that much. It makes sense when brute forcing to only test the most common classes of characters uses in passwords.

And I understand completely, it would be nice if we all had as much time as we wanted to work on projects! It's not the end of the world.

matjazpotocnik commented 8 years ago

Hi, Dan!

Thank you for the explanation. Some punctuation characters are in one set (grouping), some in the other set, and they differ in size, so ^ gets 6 "points", while * gets 10 "points", blank gets just 1 "point". I know how brute force is done and you are right, they define groups, like lowercase, uppercase, numbers, punctuation, extended chars etc. (at least that how those programmes worked some 10 years ago when I used them). I never saw that different punctuation characters are treated differently, they are in one group. Would you consider adding some subtle changes to the source so at least all punctuation would get more or less equal "score"? I don't know how to make PR, but changes are rather small. Thanks.

danpalmer / jquery.complexify.js

Why do certain characters add more complexity than others? #40