atoponce / webpassgen

Simple web-based password generator
https://atoponce.github.io/webpassgen/
GNU Affero General Public License v3.0
142 stars 23 forks source link

feature: element count #15

Closed roycewilliams closed 1 year ago

roycewilliams commented 2 years ago

Along with the bits of entropy, and the character count, It might be informative to include a compact expression of how many "elements" there were in the source list ("7777 elements").

This could help shape intuitions for the layperson.

(There may be another term that is better than "element". I led with that because some of the lists are words, some are pseudo words, some are characters, etc.)

atoponce commented 2 years ago

This is something I toyed with early on but abandoned as I just couldn't get it working the way I wanted. Maybe I can give it another go. But instead of putting the password statistics in the generation box itself which is already feeling busy to me, putting it in an overlay by clicking/tapping a link and clicking/taping to dismiss.

These are just mock-ups made in GIMP. No code to support it has been written yet.

alternate-propsed alternate-propsed-overlay

atoponce commented 1 year ago

I've implemented this for the "Alternate", "Cryptocurrency", "Diceware", "EFF", and "Random" generators. However, implementing it for the "Pseudowords" generator is escaping me.

In the case of passphrases, it's easy enough to count the number of unique words (although Diceware NLP passphrases are combined of adjective-noun pairs, meh). In the case of random passwords, it's easy enough to count the number of unique characters (or emoji).

However, in the case of pseudowords, it's super tricky. Do I count the number of possible individual pseudoword blocks? For example, with Apple, it's built with the following requirements:

So for Apple, valid pseudowords structures could look like:

Apple is the extreme case in complexity. Here's the issues with the other pseudoword tepmlates:

So yeah, this is completely escaping me on how to portray this in the overlay. Open to ideas.

roycewilliams commented 1 year ago

How is entropy being calculated for these trickier cases? Naively, if entropy can be expressed, then element count must necessarily be expressible.

atoponce commented 1 year ago

Lots of wacky math. Ignoring a lot of the fluff, here's the math for the Apple pseudoword generator:

function generateApple() {
  /*
    See https://web.archive.org/web/20210430183515/https://twitter.com/AaronToponce/status/1131406726069084160 for full analysis.

    For n ≥ 1 blocks, the entropy in bits per block is:
      log2(
        (6n - 1)      //  One lowercase alphabetic character is randomly capitalized
        * 19^(4n - 1) //  The total possible combinations of consonants
        * 6^(2n)      //  The total possible combinations of vowels
        * 10 * 2n     //  An 'edge' character is a random digit
      )

    E.G.:
      DVccvc:                      log2( 5 * 19^3  * 6^2 * 10 * 2) ~=  24.558 bits
      cvCcvD-cvccvc:               log2(11 * 19^7  * 6^4 * 10 * 4) ~=  48.857 bits
      cvcCvc-Dvccvc-cvccvc:        log2(17 * 19^11 * 6^6 * 10 * 6) ~=  72.231 bits
      cvccVc-cvccvD-cvccvc-cvccvc: log2(23 * 19^15 * 6^8 * 10 * 8) ~=  95.244 bits
      et cetera, et cetera, et cetera.
  */
  var apple = function (n) {
    return Math.floor(Math.log2((6 * n - 1) * 19 ** (4 * n - 1) * 6 ** (2 * n) * 20 * n))
  }

  const entropy = getEntropy()
  let n = 1

  while (apple(n) < entropy) {
    n++
  }

  // more code...
}

So 1 block has ~24.558 bits of entropy, but 2 blocks has ~48.857 bits, a difference of 24.299 bits, not 24.558. The difference between 2 blocks and 3 blocks is 23.374 bits, and between 3 blocks and 4 blocks is 23.013 bits, etc. So with Apple, it's not a constant difference as the number of blocks grows. So I can't say the set size is 2^24.558 as that's not correct.

The other generators have similar-but-different nuances with their pseudoword blocks. Some however, like Daefen, Proquints, and Urbit, do have a constant factor per pseudoword. But even then, how do I say that? "2^16 syllables"? "65,536 syllables"? What about Munemo? It doesn't have a syllable structure, and instead is just an encoded random number. So if you generate an 80-bit pseudoword password, do I say "2^80 numbers"? Something else?

Maybe instead of communicating a "set size", I communicate something entirely different, although I don't know what.

roycewilliams commented 1 year ago

Ah, interesting! OK, for now, I propose just not populating it at all in the cases where it's not clear how to do so. Release early and often! :D

atoponce commented 1 year ago

Yup. Thought about that also. I can put the most of structure in place, just in case I figure something out, but not actually display anything in the overlay. At least that's not hurting anything, and if nothing comes to fruition, I can always just remove it.

But I still have plenty of bugs to work out before this is ready anyway, so I can keep thinking on it, and maybe something will come to mind.

atoponce commented 1 year ago

@roycewilliams Please review this and let me know your thoughts.