SpiderOak / Encryptr

Encryptr is a zero-knowledge cloud-based password manager / e-wallet powered by Crypton
GNU General Public License v3.0
1.57k stars 137 forks source link

Encryptr.prototype.randomString Produces Biased Output #171

Closed paragonie-scott closed 8 years ago

paragonie-scott commented 9 years ago

https://github.com/devgeeks/Encryptr/blob/64223f0cb4adba80c8d89f23762bba0c8dc7fa3c/src/app.js#L285-L298

See also:

devgeeks commented 9 years ago

Thanks for this. I saw the mention of this and cringed since it is a known issue (#49).

Also thanks for code example. The whole generation of passwords is going to get better soon (configurability, etc).

alanfairless commented 9 years ago

FWIW, although there is theoretically a bias, I think not practically. The relative numbers (the size of the character set, 84, and max int) are so far away from each other to be insignificant.

To explore this, here's a test that counts character frequency with the password generation code over millions of samples. When I run it, the counts for all the characters are in similar range with each other. They're maybe one ten thousandth apart, and the rankings aren't consistent from one run to the next.

So my feeling is that the current code is safe.

var length = 10000
var repetitions = 10000
var charset = "!@#$%^*()_+{}:?|,[];./~ABCDEFGHIJKLMNOPQRSTUVWXYZ" 
              + "abcdefghijklmnopqrstuvwxyz0123456789";
var counts = new Uint32Array(charset.length)

// zero the initial counts for each character
for (i = 0; i < charset.length; i++) {
    counts[i] = 0;
}

// do this many times, so we don't have to ask for more entropy than is
// available in a single call.
for (i = 0; i < repetitions; i++) {
    var values = new Uint32Array(length)
    window.crypto.getRandomValues(values);

    // increment the counts for each selected character
    for(j = 0; j < length; j++) {
      this_value = values[j] % charset.length;
      counts[this_value]++;
    }
}

// display the counts for each character
for(i = 0; i < counts.length; i++) {
    console.log(charset[i], counts[i]);
}
console.log("Done")

Sample output:

! 1177289
@ 1175653
# 1177383
$ 1176767
% 1176294
^ 1177160
* 1176570
( 1177774
) 1174687
_ 1177255
+ 1176476
{ 1179243
} 1176130
: 1177168
? 1176049
| 1175684
, 1177278
[ 1173362
] 1175830
; 1177388
. 1175636
/ 1178087
~ 1177098
A 1176593
B 1175778
C 1176555
D 1175302
E 1177993
F 1175165
G 1175161
H 1177913
I 1176522
J 1177251
K 1177142
L 1177596
M 1174555
N 1176993
O 1175472
P 1176747
Q 1177500
R 1177641
S 1175733
T 1177037
U 1176560
V 1176273
W 1177755
X 1176649
Y 1177299
Z 1175586
a 1177313
b 1176758
c 1176912
d 1175774
e 1176409
f 1174768
g 1174083
h 1176258
i 1175812
j 1175584
k 1177244
l 1174783
m 1178112
n 1174355
o 1176912
p 1179353
q 1175976
r 1177422
s 1177318
t 1176736
u 1177215
v 1176096
w 1175166
x 1177643
y 1175695
z 1178770
0 1175137
1 1177416
2 1176001
3 1177741
4 1174018
5 1176729
6 1175210
7 1175622
8 1175680
9 1174947
Done
paragonie-scott commented 9 years ago

So my feeling is that the current code is safe.

To clarify: do you intend this statement as an argument against making this unbiased?

fx5 commented 9 years ago

| To explore this, here's a test that counts character frequency with the password generation code over millions of samples.

The bias is indeed too small to be visible with that number of samples. The chance for each character to become a "!" is 1.1764706112% and for all other characters it's 1.1764705880%.

You generated 10,000 passwords with 10,000 characters each. The chances that at least one of the characters is different from the characters generated by a fixed function (see below) are only 1 - (1 - 1 / 2 ** 32) ** (10000 * 10000) =~ 2.3%. So even for the 100 million characters you generated the result is most probably exactly the same as the result of an unbiased function.

For removing the bias it should be enough to generate a new random array if one of the generated values[i] was 0.