cmdwtf / KeePassDiceware

A KeePass 2.0 plugin that provides 'Diceware' style passphrases.
https://cmd.wtf/projects#diceware
GNU Affero General Public License v3.0
63 stars 5 forks source link

Visibility into dictionary size and entropy. #27

Open garretwilson opened 1 year ago

garretwilson commented 1 year ago

From the screen shots it looks like the plugin allows various dictionaries to be used. Does the plugin provide any visibility into the size of the dictionary being used, so that the user might have an idea of the entropy (a simple calculation based upon the size of the dictionary and the number of words used) of the resulting passphrase?

I can't find the word "dictionary" or "entropy" in the readme. Maybe the plugin provides this information somewhere else?

Otherwise, without having a lot of experience with the various dictionaries, a user might not know offhand how the generated passphrase compares to a completely random password of a smaller length.

nitz commented 1 year ago

Hey, apologies for the delay in getting to this.

There's no way to peek at the dictionaries through the plugin, though I'm going to leave this ticket open as a reminder that that would be a good feature, in addition to words per dictionary count, if I ever get around to adding custom dictionary support, which I hope to at some point.

As for the actual number of options per dictionary & related entropy: I honestly have always just relied on KeePass' entropy calculation (e.g.: their "quality" field in entries,) but perhaps there's a way to at the very least expose that through the plugin too.

I think the trouble with actual entropy values vs. generated passphrase starts to get muddy quick, since the plugin is capable of more than just selecting random words from the list. While it's perfectly capable of generating "correcthorsebatterystaple", it also introduces several options for different mutators so that the base generation like that would end up coming out something like "correct horse battery staple", or "C0rr3c7-H0r5e-B4tt3ry-S74pl3", or even "CoRrECT%h0r53%battery%sTAPlex^¾", all of which would have vastly differing entropy values while being based on the same 4 dictionary words. (For instance, KeePass' quality reports entropy of 81 bits, 99 bits, 129 bits, and 149 bits respectively for those examples.)

Sorry, went off on a bit of a tangent there. At the very least, if you'd like to look at the dictionaries used at this very moment, you can see them in the repo, in the Resources directory: https://github.com/cmdwtf/KeePassDiceware/tree/main/Resources

garretwilson commented 1 year ago

I think the trouble with actual entropy values vs. generated passphrase starts to get muddy quick …

To some extent, yes …

… I honestly have always just relied on KeePass' entropy calculation …

… but the KeePass entropy calculation in this case produces a value that has no resemblance to the real entropy. It's not just a little off—it's completely wrong.

Let's say you have a dictionary with three words, "foo", "bar", and "baz". You generate a password "foobar". KeePass would think that the unit of variation (I made up that term; I'm not a mathematician) is each letter, for six positions, so it would assume 26^6, or 308915776 combinations, or log2(308915776) = 28 bits of entropy. (Actually KeePass says 12 bits, perhaps accounting for the fact that "o" is doubled. Typing "fozbar" shows 29 bits of entropy, closer to what is expected.)

In reality, we're not using one of 26 letters for each position, but rather one of three words for each of two positions, so the number of combinations is 3^2 = 9 combinations, or log2(9) = 3 bits of entropy. Not a good password if the attacker knows that the dictionary being used is "foo", "bar", and "baz".

So while I realize that "correct horse battery staple" probably has more entropy than "correcthorsebatterystaple", and "C0rr3c7-H0r5e-B4tt3ry-S74pl3" probably has much more, and while I acknowledge we'd have to think in intricate and subtle mathematical terms to figure out how the variations are adding entropy, the point here is that users just need to get at least a general idea of the baseline entropy based upon the dictionary size.

Thus for "C0rr3c7-H0r5e-B4tt3ry-S74pl3", the first pass of this feature could simply calculate:

Thus if the plugin simply said "this password has at >=49 bits of entropy" (at least 49 bites of entropy, ignoring the extra variations that were added), that would be a huge improvement, because what KeePass shows is based upon a completely different understanding of the password.

Finally I'll note that you'll need to take the minimum of the KeePass entropy (or calculate it yourself) and the dictionary-based entropy value, to take into consideration that an attacker could use a brute force attack based upon the individual letters or on the dictionary.

I'm not a cryptographer nor a mathematician, so please feel free to point out any errors.

ThisMakesSenseToMe commented 1 year ago

I don’t agree with Garret. He’s assuming the attacker knows what dictionary was used, which is probably almost never the case, and salt (numbers) at random places makes it so that the words cannot be found in ‘the’ dictionary (which is different from just replacing some letters with numbers). Plus a lot of people will use multiple dictionaries, maybe in multiple languages. In reality an attacker will probably have to use brute force. So calculating the entropy based on that assumption seems reasonable. Of course, it would be even better if a user could add his own dictionary.

nitz commented 1 year ago

I'm not a cryptographer nor a mathematician, so please feel free to point out any errors.

I know this feeling! I'm just a fan that has the t-shirts 😂

Thus if the plugin simply said "this password has at >=49 bits of entropy"

Actually, that may be a good way to display something like that: a baseline ("pre-enhancement") level of entropy. I quite like the "at least x bits", plus it keeps the logic for calculating it relatively simple.

… but the KeePass entropy calculation in this case produces a value that has no resemblance to the real entropy. It's not just a little off—it's completely wrong.

I've never actually looked at the source of KeePass' entropy calculator, but it does seem to have knowledge of at least some dictionary (english) words, as I regularly see cases where appending a new character to a phrase that takes a non-dictionary word and makes it a dictionary word actually decreases the entropy despite the longer length. For example:

deepe, 5 ch. claims 16 bits: image

But deeper, 6 ch. claims only 12 bits: image

While deepee, also 6 ch. increases to 17 bits from deepe: image

Now I'm completely curious as to what their calculation is. I wonder if it's something like zxcvbn, which exhibits the same behavior. (Though zxcvbn doesn't seem to return entropy estimates as much as it does just score the password and provide that "guesses log10" value.)

nitz commented 1 year ago

Of course, it would be even better if a user could add his own dictionary.

That's the long and short of it, isn't it? 😅

ThisMakesSenseToMe commented 1 year ago

Maybe you can also look at StrongBox’s implementation?

garretwilson commented 1 year ago

He’s assuming the attacker knows what dictionary was used, which is probably almost never the case …

So does it take a large stretch of the imagination to think that an attacker would start with all the most common dictionaries, such as those this plugin uses?

In reality an attacker will probably have to use brute force.

This whole discussion is based upon the assumption that the attacker is using brute force. Only the most unsophisticated attacker would simply use the ASCII letters. I would imagine that an attacker with any sense at all would start millions of parallel brute force attacks, some based upon the ASCII letters, and others based upon the most common dictionaries.

garretwilson commented 1 year ago

Now I'm completely curious as to what [the KeePass] calculation is.

I'm curious too, but friends, let's not get sidetracked too much in this ticket away from the original simple request. The idea is that a user would simply like to have a general idea of how much variation the dictionary-based password has, so that they can decide whether to simply use a shorter random string instead. Some general "at least this much" number would be very useful. Currently the only way to find this out is for the user to go to GitHub or somewhere, find the dictionary files, pull out a calculator, etc.

garretwilson commented 1 year ago

I know this feeling! I'm just a fan that has the t-shirts 😂

haha I need more T-shirts. Tell me where to get them. (Now I'm getting off the subject. 😅 )

ThisMakesSenseToMe commented 1 year ago

Why would the attacker assume that the diceware plug-in is used at all? And even KeePass?

garretwilson commented 1 year ago

[KeePass] does seem to have knowledge of at least some dictionary (english) words, as I regularly see cases where appending a new character to a phrase that takes a non-dictionary word and makes it a dictionary word actually decreases the entropy despite the longer length

That's interesting. Still KeePass says "correcthorsebatterystaple" has 81 bits of entropy, while above I illustrated that an example dictionary size of 5,000 words would produce only 49 bits of entropy. My whole point here is that the user might like to know a general "at least" entropy calculation based upon the known dictionary size (using the min() function with the character-based calculation).

garretwilson commented 1 year ago

Why would the attacker assume that the diceware plug-in is used at all? And even KeePass?

The attacker doesn't "assume" anything. The attacker tries things. (That's the whole point of a brute force attack.) The attacker doesn't try a single thing. The attacker tries many things. The attacker likely starts trying the most common things, and the diceware dictionaries and dictionaries used by KeePass plugins are some of the most obvious common things to start with.

@ThisMakesSenseToMe , if you were a nefarious consultant, and an attacker were paying you to advise him/her on which dictionaries to use in a brute force attack, and the attacker asked you for a list of 10 of the most common dictionaries to start with … hopefully you get the idea.

ThisMakesSenseToMe commented 1 year ago

I honestly don’t agree. Why won’t it be generated by using LastPass, or Bitwarden, or 1Password etc.. When an attacker doesn’t have specific knowledge, it is very hard to crack such a long password.