PRNG issues - Githubissues

vibornoff commented 10 years ago

From discussion with @infinity0:

Random_getValues looks a bit weird. We want to populate the buffer, but you end up creating a bytes = new Uint8Array() and populating that instead? Also, why does it use both _global_crypto.getRandomValues and the ISAAC PRNG? Could you document this in some greater detail?

The main ISAAC web page (1) says it does not have a seeding routine. Could you tell us how your seeding function works?

vibornoff commented 10 years ago

Random_getValues takes ArrayBuffer or TypedArray(with its own underlying buffer) as its argument. new Uint8Array is constructed as a view of the buffer. Populating it also populates the argument passed.

ISAAC is used to "improve" random values (especially in case of crypto.getRandomValues absence) and adds seeding capability.

My ISAAC implementation is mainly derived from https://github.com/rubycon/isaac.js. Its seeding routine applies the simple mixing function to the seed and fills internal state, take a look to https://github.com/vibornoff/asmcrypto.js/blob/master/src/random/isaac.js#L82-L145

vibornoff commented 10 years ago

NB I'll rewrite https://github.com/vibornoff/asmcrypto.js/blob/master/src/random/isaac.js#L82-L84 in a way not loosing seed entropy.

infinity0 commented 10 years ago

Thanks for the explanation of the buffer/view, I'll look up the relevant docs.

I see what you're doing with mixing in the ISAAC output now - typically you can xor two independent streams S and T, and if one of them is securely-random then the result of the xor is also securely random. Though, "improve" is not really the right word here.

However, if neither are securely-random, then the result is not securely-random. In the case of Random_getValues, this will be the case if crypto.getRandomValues is unavailable and ISAAC's seed is predictable (e.g. 0). In such a case, you should hard fail (throw an exception) instead of continuing to generate predictable numbers. (Tell the user to upgrade their browser, or tell the developer to collect entropy from them, or something.)

Also, I would not even bother with _global_math_random(), this adds no unpredictability, and might even help the code bypass sanity checks like != 0.

I can see that you followed the rubycon seed() function, and judging from the comments, it "tries to do" some mixing. However, do you know what the abstract algorithm is - does it have a name? How can we be sure it is not reversible by an attacker? One alternative I can think of, that is theoretically solid, is to sha256 the human-seed input (maybe compress it first as a basic sanity check), then use the bytes from this as the internal state seed.

vibornoff commented 10 years ago

Every PRNG produce predictable values when you know its internal state.

Under my assumptions Math.random() is not secure in such a way that one can reconstruct its internal state from its output values (reverse its output in other words). At the same time Math.random() output isn't deterministic as it's internally seeded with something containing real entropy (i.e. CPU cycles counter and/or current microtime). Quality of that seed is a separate issue and should be investigated closely.

Math.random() output is XOR-ed with ISAAC output to make the resulting stream unreversible even when ISAAC is poorely seeded. I don't like an idea of hard failing in that case since crypto.getRandomValues itself never hard fails. In my opinion it's better to gracefuly degrade and (I agree) to warn developers about seed quality.

@rubycon's seed(…) is a slightly modified randinit(…) from rand.c. Mixing function is also taken from there.

vibornoff commented 10 years ago

Taken a look to V8 Math.random() internals. That make me crying with tears of blood.

So you're right, it's pointless to XOR with Math.random() output. But I still leave it for seeding ISAAC as a fallback method (to provide gracefull degradation) and warn into console about entropy quality.

bored-engineer commented 10 years ago

infinity0 commented 10 years ago

You are right that every PRNG is predictable given the internal state. However, cryptographically-secure PRNGs are unpredictable given only the output. It's not clear that this condition is met by the seeding.

It appears your seed function is a tweak of Bob's original randinit that mixes in user-supplied entropy s into r naively using xor. This is fine if r is already strongly-random, but if it starts off uninitialised, it's not clear what we're achieving. If your s is biased towards certain values, an attacker gains an advantage in guessing these values, and matching it against the output he sees. It's far from clear that the rest of randinit was designed to counteract this; it works directly on r so it looks like Bob assumes this is already uniformly-distributed, which would be false for uninitialised r and biased s. (I will email him and ask in more detail.)

A safe measure would be to use sha256(compress(s)) rather than s directly (and check that compress(s) is >256 bits). You could do this inside Random_getValues, if you don't want to make isaac.js more complex. Then, you should also add a disclaimer to the documentation of seed(), to tell people that no processing of the seed is done. (If Crypto.getRandomValues is available, you can actually seed ISAAC using this, combined with the user-collected entropy. But arguably this is pointless since we'll just use the former directly.)

Also your implementation doesn't quite match randinit() - Bob uses addition like f+=r[i+5] whereas you use bitwise-or like f = (f + r[i|5])|0 - could you explain the difference?

Could you add a test for the ISAAC generator, to make sure it matches the original tests?

If your application does not require cryptographically strong random numbers, I agree with the "degrade gracefully" option. But if it does require strong security, then "degrade gracefully" is absolutely not the right approach here - security is part of your contract with your client, and returning non-secure random numbers is as wrong as returning 3 to answer 1+1. Perhaps you could add a "enforce_secure" flag to Random_getValues?

vibornoff commented 10 years ago

The seeding routine is as follows:

comress(s) → r. At this step s doesn't have to be uniformly distributed. The main requirement is to lose as less of the original entropy as possible.
mix(r) → m. Mixing function makes m uniformly distributed. At this step m desn't have to be cryptographically strong.
prng(m) → {m,r}. Note that prng is claimed to be cryptographically strong so one can't reverse m from the known r.

You suggest to replace mixing function with sha256. It won't have any effect exept increased amount of calculations to seed the generator. Attacker (as before) will be able to guess s and match the output with values he has. The only way to countercheck such a guessing is to provide enough of real entropy within s (no matter biased or not).

vibornoff commented 10 years ago

Javascript statements f+=r[i+5] and f = (f + r[i+5|0])|0 are absolutely equivalent exept that the last is stocked with type hints to help JIT optimize the code.

Also i|k and i+k|0 are equal when i itself is a multiple of 2^n and k < 2^n. It dosn't add much speedup but is a bit eye-friendly. If you consider it is weird or potentially erroneous I'll change that back to i+k|0 form.

vibornoff commented 10 years ago

No problem, I'll add original tests into the test suite. Thought I manually tried some seeds and got same results.

Update: I won't add these values into the test suite since it's an open challenge:

Challenge, started 1996/march/27: The files

standard.h
rand.h
rand.c
randtest.c

randtest.c gives the first 2560 results of ISAAC initialized with an unknown seed,
along with the program that initialized ISAAC and produced the results. What was the seed?
Send mail to bob_jenkins@burtleburtle.net. (Converting bytes to ub4's is
endian-dependent, so the seed will only be English on something like an 80?86 machine.)

infinity0 commented 10 years ago

(Thanks for the explanation about the i|5 syntax, sorry I was not thinking in too much detail.)

Do we know that mix(m) does that, though? That is why I suggested sha256, because this has had much more scrutiny. (I am not enough of a cryptographer to analyse mix(m) myself.) On Bob's web page, he admits he has less confidence in randinit(). We can keep it like this for now, but I will ask him for more details via email.

I am not sure about your new compression function. It looks like you are now xoring 4 elements of s into one element of r; this will not detect say, a biased stream where all high-bits of each byte are unset. Actually, if mix(m) works as you say, the compression is unnecessary - I only suggest it as a safety/sanity check that you have "enough entropy" in the seed, not as a pre-processing step. If the seed is predictable by an attacker it won't help, but it can stop the user from doing something insecure like seeding with 0 or a heavily biased stream of bytes.

infinity0 commented 10 years ago

Sorry, I have my non-JS hat on; compression of the kind I'm thinking of is not trivial in JS. Perhaps you could do an entropy test instead, which would be simpler:

function entropy_estimate(byte_array) {
    var counts = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
    for (var i = 0; i < byte_array.length; i++) {
        var lo = byte_array[i] & 0xf;
        var hi = (byte_array[i] >> 4) & 0xf;
        counts[lo]++;
        counts[hi]++;
    }
    return counts.map(function(x) {
        return x ? -(x/byte_array.length/2 * Math.log(x/byte_array.length/2) / Math.log(2)): 0;
    }).reduce(function(a, b){
        return a+b;
    }, 0) / 4 * byte_array.length;
};

Keep a running counter of the entropy added so far; whenever you call seed(s), you increase this counter by estimate_entropy(s). If the user tries to call Random.getValues(enforce_secure=true) and ISAAC has not been seeded with enough entropy yet, hard-fail.

edit: corrected the code; first version was calculating relative entropy

edit: I can do this myself and submit a pull request, if you agree with this idea but prefer to focus on something else.

vibornoff commented 10 years ago

Sorry for late response, was AFK for 3 days.

Your idea of entropy estimation is great and clear to me. However I consider it's out of scope of the seeding routine. It's another Big Thing.

You put an idea into my head. Now I'm changing the seeding routine in a way the seed entropy isn't loose (in avegare) even when the seed values are heavily biased. The only thing is left to check is seed length.

infinity0 commented 10 years ago

Checking for insecure conditions is a requirement for sensible crypto libraries. If you don't want to put it in the ISAAC seed function, you should at least put it in Random_getValues. Likewise with the enforce_secure flag.

As I said, your processing of the seed in the past two commits don't make much sense, given your original assertion that mix() is supposed to be a random function. It just makes things harder to review since it is different from Bob's original code. Do you have any justification on why you made these changes?

vibornoff commented 10 years ago

Yep, I'll change getValues in a way you suggest. Also I think of turning "enforce-secure" on by default and provide an option to turn if off.

Of course these changes aren't random. The idea is to consume as much entropy from the seed as it's possible (no matter seed is biased or not). Seeding routine isn't yet complete (there is a bug processing zero seed) though I sure it's clear to you.

Given true random values A and B both, say 8 bit lengh. Combining them into a larger value gives up to 16 bits of entropy in theory. But not every way of combining gives the best result. Say jush adding them gives only log2(2^9 - 1) bits of entropy, xoring them — log2(2^8) bits. No care when only 8 bits of entropy required. We can take it from A, or from B, or add them and clamping carry-out bit (not exactly), or simply xor values.

Now suppose both A and B are biased having 4 most-significant bits set to zero. Thus only 4 bits of entropy available from every value. Adding them we get only log2(2^5 - 1) bits of entropy. The best way is to combine A and B into 8-bit value. Entropy isn't lose in this case. In general we can't do any assumptions are A and B biased (or not) and how. Our purpose is to consume all the entropy from A and B.

Now suppose mix(m) is a random function with 8-bit input and 8-bit output. Thus mix(A) gives 8-bit value (containing at almost 4 bits of entropy) and each of these 8 bits depends on all 4 random bits of A. Now it's safe to xor: mix(A) ^ B gives 8-bit values (containing at almost 8 bits of entropy).

Why "at almost"? Because mix(m) is not a permutaion function, it loses some of entropy due to collisions (and we can estimate its order of magnitude: 1-exp(-k*(k-1)/(2*N), google for more details).

Update: My past 2 commits make sense only for salts exceeding the ISAAC internal state size. Seeding routine now splits the salt into 1024-octet chunks (zero-pads on demand) and perform mix(m ^ s[j]) → m for each chunk ending up with m containing no more than 8192 bits of entropy.

vibornoff commented 10 years ago

Also I can dive into mix(m) and show that every bit of its output depends on every bit of its input. It's not trivial but can be done with Zhegalkin polynomials.

infinity0 commented 10 years ago

Thanks for the explanation - your reasoning regarding the chaining strategy (to combine mix(s[0:4]) with s[4:8] etc etc) seems sound, and I understand the intention of the code better now.

However, your implementation seems non-obvious to me. You are "combining" successive substrings of the seed by doing (e.g.) m[i|0] ^= a in the first "mixing" step. Why is this? I think the more obvious approach is to set r[] = m[] ^ n, during the "process seed chunk" step? (You also appear to be missing the crucial if ( (i & 3) !== 3 ) continue; line.)

Unfortunately, I don't have enough mathematical cryptoanalytic background to properly evaluate the security of either method - I'm just curious, since the one I mentioned seems more directly consistent with the reasoning you gave above. It's also closer to Bob's original code, in the sense that you are glueing randinit() operations together, but in your current implementation you are adding xor operations into the middle of randinit(). I'll email Bob for more advice; I'm sure he'll have some opinions on this.

As a side note, I think it may be clearer to un-inline mix(). We will only call seed() a few times, so performance isn't such a big deal.

vibornoff commented 10 years ago

Using crypto.getRandomValues directly is bad idea.

All the juggling with ISAAC need not only for providing good random values when crypto not available. It's need for armoring proprietary getRandomValues implementations (in IE and Safari). I sure you heard something about NSA-inspired backdoors in various PRNGs.

infinity0 commented 10 years ago

Hm, I'll think this through a bit more. I am conflicted - it feels ugly to implement your own crypto when the system gives you an API for it. If we trust the implementation, then by all means we should use it, given that seeding ISAAC is quite fiddly. I also think it's more safe if we didn't solely rely on ISAAC.

BTW I was still XORing the output of crypto.getRandomValues with the output of ISAAC, as you did before. I realise my documentation doesn't reflect this, I will fix it.

Also, I avoided seeding ISAAC using c.getRV (which you were doing) - if you don't trust it, then you shouldn't use it to seed any mitigation measures. (Though, perhaps seeding and xoring with the stream would help, since then the attacker has less chance to observe a pure output for deducing the internal state.)

Feel free to cherry-pick my other commits:

I tweaked the seed() function so that it uses the original randinit() as a primitive. So now we push each 1024-byte block through mix() etc and prng() to give a new r. This is similar in spirit to what you had before, but uses the original primitives directly instead of changing them.

I removed the reset code because I don't think it should ever be needed in a real user application, and might allow harmful effects. If you need to reset the state during tests, you could refactor isaac.js to look something like this:

var ISAAC = ( function () {
  var Isaac = function () {
    /* everything else */
    return {
      spawn: Isaac,
      // other methods
    }
  }
  return Isaac();
})();

then in tests, you can call asmCrypto.ISAAC.spawn() to create a new PRNG with uninitialised seed.

vibornoff commented 10 years ago

Given Dual_EC_DRBG. Suppose we have cypto.getRV based on it. If it's properly seeded and contains enough of entropy it will produce good output in terms of probability theory (good getRV implementation). Otherwise I think we should have already been heard about its weekness from various security bulletins.

Now suppose someone has a backdoor for PRNG itself and is able to reveal its internal state from its output. Though it's not a CSPRNG under such a circumstance but it's still a good source of entropy and we can seed another (known to be) CSPRNG with its output. The only way to counteract breaking this scheme is to prevent raw getRV output leakage. And I'm not sure about Math.random implementations. If any shares its state with getRV it has to be armored too.

vibornoff commented 10 years ago

I'm thinking of entropy estimation. It works well when values supplied as seed are independent form each other. Though it's not always true and your method tends to overestimate the entropy. Especially when you get its from mouse moves or keystrokes. I suppose an addition investigation is needed to justify the correctness of such method of estimation.

vibornoff commented 10 years ago

Also I think we should unconditionally seed with available method (getRV or weak_seed) treating supplied values as an additional seed rather than seeding solely with it. That should prevent weak seeding with heavily dependent values or even seeding with constant string (though the better protection is lobotomy for those who do that I think).

infinity0 commented 10 years ago

"The only way to counteract breaking this scheme is to prevent raw getRV output leakage." - yes, this is what I worry about - if we don't trust the system RNG, we can't guarantee that its output is not leaked elsewhere, which would let the attacker derive the internal state and the seed to ISAAC. So we need to ensure good entropy for ISAAC, no matter what system RNG is in place. I'll update my logic to do this.

I agree that we should treat all supplied values as additional seeds - that is why I removed the reset() stuff. Do you think that is adequate? edit: note that I updated seed() so the input is mixed in with the previous rng output, so calling it multiple times should preserve entropy from previous inputs.

vibornoff commented 10 years ago

I propose to relax «guarantee that its output is not leaked elsewhere» down to «guarantee that its output is not leaked from things under our control».

Imagine a derivation of system RNG by malicious site opened in one of the tabs along with the legit site. In theory such a derivation may help with breaking TLS connection and performing MiTM attack. From this point no matter does legit site use asmcrypto or not — all the security is undermined. Though it's too fantastic to be practical ☺

Edit: More realistic way of exploiting such a potential RNG weakness is to extract random values from e.g. symmetric block cipher padding, derive RNG state and try to generate some keying material.

Edit: At least FF and Chrome allow to replace their Math.random and crypto.getRandomValues with own implementations.

Edit: IE11 allows it too.

infinity0 commented 10 years ago

Well, if we have good entropy for seed(), we don't need worry about the system RNG being leaked. So I still treat entropy as a requirement; and I treat seeding ISAAC using getRV as part of the "weak seeding" measures - we don't know if it really makes ISAAC strong. (The attacker doesn't need to do a MITM, he can just make the user visit his website where he calls getRV several times. Then he can deduce the state and generate a bunch of candidate values for the seed.)

I'm not confident enough to completely replace the system RNG with this ISAAC implementation, so I suggest keeping the xor. I don't see a harm in this, and it could save us if we ever find a bug here but the system RNG was still good.

I hope this newer commit makes the logic clearer. In the future we could even set _assume_strong_system_rng conditionally based on the browser and OS; atm it is just false.

vibornoff commented 10 years ago

«The attacker doesn't need to do a MITM» — yep, he doesn't, but as «he can deduce the state and generate a bunch of candidate values for the seed» he also can generate a bunch of candidate values for TLS keys. Though it was mentioned only as an example attack vector (yet impractical as it requires attack to be performed at near-the-same time with user generating his keys).

More realistic threat I think is recovering random material from crypto paddings as it may be performed virtually at any moment and very useful as keys are usually generated at near-the-same time with padding.

Merged your work recently.

I reverted entropy estimation code for now. Take a look to the gist I've created — it's an estimation of entropy of 1001 mouse move events captured. Relative move {x,y}-shifts are translated to polar coordinate system thus providing 2 independant entropy sources: relative shift distance and relative direction angle.

As you can see, straightforward entropy estimation gives ~12 bits of entropy per move event, whereas 1st-order conditional entropy estimation gives only ~5 bits per event. And that's what I was talk about — correct entropy estimation is a hard. It requires to build a model of entropy source and then test the supplied values with it.

infinity0 commented 10 years ago

Thanks for the merge! Yes, you're right that entropy_estimate potentially over-estimates. I can see now that using diffs gives a more accurate estimation, than calculating it from absolute values. (From the point of the seed function though, it should not matter, since it mixes in the same entropy in either case - so perhaps we can add the diff-logic into the entropy estimation, but I think it's unnecessary to do this in seed(). Either way is fine, though.)

Could you explain the difference between H0 and H1 in your code? I'm unfamiliar with perl and the syntax $L{$L} looks weird to me. Here is my attempt at doing something similar in javascript:

function _bytes_entropy_estimate(byte_array) // from my commit
var x = [/* your data */];
var y = [/* your data */];

function lengths(x, y) { return x.map(function(v, i, a) { return Math.sqrt(v*v + y[i]*y[i]); }); }
function angles(x, y) { return x.map(function(v, i, a) { return Math.atan2(v, y[i]); }); }
function diff(v, i, a) { return i ? v-a[i-1]: v; }

var dx = x.map(diff);
var dy = y.map(diff);

var dlen = lengths(dx, dy);
var dphi = angles(dx, dy);

console.log(_bytes_entropy_estimate(x)); // 879.146782488592
console.log(_bytes_entropy_estimate(y)); // 790.5460384523352
console.log(_bytes_entropy_estimate(dx)); // 753.9574330588342
console.log(_bytes_entropy_estimate(dy)); // 640.8890090086104
console.log(_bytes_entropy_estimate(dll)); // 696.1878599375523
console.log(_bytes_entropy_estimate(dphi)); // 494.9260320208844

vibornoff commented 10 years ago

You're right, there is no point in diff-logiс for seeding routine. It's only for correct estimatioin of mouse move events entropy.

Regarding to my code, I calculated diffs of lengths (not lengths of diffs) since X and Y are already diffs (offset from the previous mouse position, see MouseEvent.movementX). As X and Y are heavily correlated its can't be considered independent.

Calculating

L[i] = SQRT( X[i]^2 + Y[i]^2 );
PHI[i] = ATAN2( Y[i], X[i] );

we get uncorrelated L and PHI (strictly speaking it's not, though enough) containing the same entropy as X and Y. But we need to go deeper ☺ Randomness is contained in dL[i], the move acceleration, and dPHI[i], the move direction change. Nevertheless values still heavily depend from each other, large dL[i] more likely followed by large dL[i+1] and vice-versa.

My code calculates relative entropy of individual mouse move event H0 considering diffs are independent and relative conditional entropy H1 considering each element depends no more than only the previous element.

infinity0 commented 10 years ago

I don't think we should name it _random_guaranteed_entropy. The entropy estimate function was always only meant to provide an upper bound - if you don't hit it, you know something is wrong, but if you do hit it, it's still no guarantee. As it is, currently you're just counting the number of bits (_random_guaranteed_entropy += 8 * blen;) which is an even worse over-estimate than my previous code. (I guess you plan to change it?)

edit: I also think we shouldn't increase the entropy count in Random_weak_seed. The 50 bits is likely an overestimate; performance.now() is only contracted to be accurate to milliseconds not microseconds: [1] [2]

vibornoff commented 10 years ago

Agreed, I'll rename it to _random_estimated_entropy.

Regarding to the estimation method I think bit length isn't much worse for typical use cases than the method you proposed. The main difference is in handling intentionally weak seeds (e.g. your method prevents from seeding with zero-vector).

As I mentioned earlier correct estimation requires to build a model of every entropy source we consume randomness from. The best solution I think will be an implementation of entropy collector wich collects events rather then bits and has a model for each type of events. Up to user is to subscribe to ready state change and proceed to generating the stuff.

infinity0 commented 10 years ago

I've thought a bit more about the case of the weak system RNG. To properly protect against it, we need user entropy for ISAAC, but this incurs a cost for every reload of the page. In practise, people might be tempted to set allow_weak, but that's not what that flag is for. Ultimately, this is for an issue that is not ours to solve - it should be solved in the layers below us. So I think perhaps we should relax things a bit:

in getValues, we assume the system RNG is secure, and say this in its documentation. We'll continue to mix in ISAAC, and use weak_seed, to make life more difficult for the attacker in case it's not secure
- but, importantly, we should not make any contractual guarantees towards the effectiveness of this.
in seed, we don't assume the system RNG is secure, so people can seed it properly in critical cases like generation of long-term keys. We add the entropy estimates to detect bad seeds, but not to guarantee the security of seeds.

The end result would be:

use-cases that can't rely on the system RNG can collect entropy and call seed()
users without a system RNG and no allow_weak will get an error, and be forced to collect entropy
users wih a system RNG who are happy to trust it don't need to collect entropy
allow_weak will use ISAAC with no seeding, or weak seeding

As opposed to the current situation, where everyone is forced to collect entropy, unless they set allow_weak.

Does this make sense? (edit: btw, sorry for changing my mind about this; I can make this change. It has taken me a while to clearly sort through the various threat models, sure defences, vs possible-countermeasures.)

infinity0 commented 10 years ago

For user entropy, of course you can create more accurate models, but they are more costly to implement and review. The entropy estimate function I mentioned is a few lines and detects a lot of the more obvious bad cases, so why not add it? It's not even performance intensive.

Judging by the numbers from the experiments, there is about 1/2 the actual amount of entropy in a stream of data collected from the user. So if you don't want to add the estimation function, you should at least halve your estimates, to 4*len. And the weak seed entropy is probably more like 25 bits, just from running some samples through my estimator.

(The use of PBKDF does increase the time it takes to brute-force, so it might effectively add several more "bits" of protection, but I don't know even the rough numbers behind this. We must be clear to distinguish between "things we think might help" vs "things we know will achieve security" that we can actually advise users to rely on in the API documentation.)

infinity0 commented 10 years ago

I added a few commits, not sure if you will like them... I decreased the entropy estimates, and added some tweaks. The last commit is the relaxation of the system RNG assumption, but I understand if you would prefer not to have it. I am just worried that the requirement of "collecting user entropy" is quite heavy, and might encourage people to set allow_weak when inappropriate.

(edit: please ignore 82b9dfd linked above; ec03b93 and its ancestors are the intended ones)

vibornoff commented 10 years ago

Last commits look solid. Sure they'll be merged.

«weak seed entropy is probably more like 25 bits» — how did you collect values to be estimated? Is that was something like for ( i = 0; i < 1000; i++ ) collect_weak_values();. If so you're at wrong way. Though I'll take a closer look.

infinity0 commented 10 years ago

Ah, I was reading my units wrong. I consistently got about 20 bytes not bits.

function arbitrary_bytes () {
    var buffer = new Float64Array(3);
    buffer[0] = Date.now();
    buffer[1] = Math.random();
    buffer[2] = window.performance.now();
    var bytes = new Uint8Array(buffer.buffer);
    var a = [];
    for (var i=0; i<bytes.length; i++) { a.push(bytes[i]); }
    return a;
}
_bytes_entropy_estimate(arbitrary_bytes())
21.45409100759263
_bytes_entropy_estimate(arbitrary_bytes())
21.469973715389354
_bytes_entropy_estimate(arbitrary_bytes())
20.264231337692227

I suppose this is an over-estimate. I just think, if the application already has set up infrastructure to collect entropy from the user, we might as well let them go all the way, rather than trying to estimate weak_seed entropy to "save them effort".

(The PBKDF2 and crypto.getRandomValues will likely make things harder for an attacker, but we don't have a good model to estimate this so I think it's better to avoid advertising it in documentation.)

vibornoff commented 10 years ago

I'm thinking of phasing out allow_weak since I can't imagine any practical use-case when it's required.

At the test suite (take a look to test/bignum.js I have to seed PRNG with constant to avoid sudden test failures at platforms not providing crypto.getRandomValues. I don't like an idea of passing allow_weak instead since I should place it everywhere and somedays I'll forget to turn it off.

Maybe turn allow_weak into a static flag and proceed with asmCrypto.random.allowWeak = true. What do you think?

infinity0 commented 10 years ago

How about setting it to false by default, then true in your tests?

If an application wants to support older browsers, we really should recommend user entropy collection, rather than PBKDF2. I know the latter would improve security - but to what extent? The overall idea has not been well-tested or studied. So I don't think it's wise to rely on this as a default, when no strong RNG is available. The usage of location.href as a salt is also not very strong, and you haven't accounted for this in your security explanation. (If you think PBKDF2 of ~50 bits is good enough for older browsers, why don't we just do this for everything, and forget about seeding?)

Setting 100k for the number of iterations for PBKDF2 takes about 6 seconds on my (correction:1)-year old laptop. I don't think this is reasonable for a web application. 4k already gave a noticeable pause, which is why I set it to 1k when strongly-seeded. It was always just meant as a opportunistic countermeasure, rather as a method for strong security, so I think 4k/1k is preferable.

(Also, why did you omit a68f4806 ?)

infinity0 commented 10 years ago

Just added 16361fae which fixes the mixing logic. The previous code assumed r is a zero vector, but that no longer applies when I removed the reset behaviour in ab5a6a6d. edit: Also added 4ed507e2 since this behaviour is closer to the original randinit().

vibornoff commented 10 years ago

«How about setting it to false by default, then true in your tests?» — that's exactly what I meant.

The usage of location.href as a salt is intended for prevention of rainbow-table attacks.

~50 bits isn't good enough. That's why we insist on entropy collection and seeding with it. Also we should be as opportunistic as possible at entropy collection. If you have an idea how to obtain more entropy at page load not making end-user doing something I would be glad to implement it.

100k iterations of PBKDF2 takes at about 250ms on my 5-years Core i3 laptop under Chrome. Take a look to jsperf benchmark. I sure that's because you run under Safari or you have been debugging asmcrypto — that turns jit optimizer off thus speeding down significantly.

Read this and aware of setting iterations number too small.

https://github.com/vibornoff/asmcrypto.js/commit/a68f4806 isn't there because it has no effect on getValues and there's no any clarification why its output reveals the state and why estimated enropy is decreased for 16 bits for every random byte generated. Why not 13? or 42?

infinity0 commented 10 years ago

Ah OK, that sounds good to me then. :)

Are you sure it takes 250ms? I am using simply <script src="asmcrypto.js"></script> in a HTML file, then asmCrypto.PBKDF2_HMAC_SHA1.bytes("ouoeuio", "oeioeuio", 100000, 1024); in the dev console in Chrome 32 (well, Chromium on Debian). I don't have any special environment set up... Note that on the jsperf page you linked, it only does 1000 iterations not 100k.

Yes, I'm happy to set the number of iterations as high as possible, I just worry about this 6 seconds thing.

As for a68f480, it has does have an effect on getValues - it means you need to seed() the thing with more entropy, until getValues will treat this as "strong enough". So that you can't get into the situation where you seed it with 4 entropy, get random values, seed it some more, etc, until the estimate reaches 256 bits. The "16" is a hopefully-safe over-estimate here, much like the "4" is a hopefully-safe under-estimate when increasing our estimate. I acknowledge I am being super-paranoid here, but I want to discourage developers from trying to "game the system" - this is the same reason why I added the buffer=0 logic etc. (edit: I suppose this will require some ridiculous gymnastics though, so you could leave this out if you think the complexity is not worth it.)

vibornoff commented 10 years ago

I have cherry-picked https://github.com/vibornoff/asmcrypto.js/commit/4ed507e2 as https://github.com/vibornoff/asmcrypto.js/commit/ccb4ac820d417655fd15918ff7237b3f08bd6ba6 and ammended it a bit.

Update: https://github.com/vibornoff/asmcrypto.js/commit/16361fae also picked as https://github.com/vibornoff/asmcrypto.js/commit/fcca4f89d8a92c68f9101321df93d6b580ae1007

vibornoff commented 10 years ago

Ouch asmCrypto.PBKDF2_HMAC_SHA1.bytes("ouoeuio", "oeioeuio", 100000, 1024); actually means you run (edit:) 100000 * ceil(1024/20) = 5200000 iterations. So I'm not surprised it takes 6s

vibornoff commented 10 years ago

Regarding to a68f480. Under assumption of weak seed only we have 2 mutually exclusive things: if you believe PBKDF2 is efficient against brute-force attack you have to believe the state isn't leaked with RNG output too; or there's no point in seed derivation at all and we may throw away PBKDF2 and seed RNG with collected bytes directly.

I tend to believe PBKDF2 is efficient.

Though there's not a much room, I estimated minimal entropy available at first weak_seed call as ~50 bits. If that's wrong and there's actually, say ~40 bits everything is bad, we have to increase interation count to 100000 * 2^10 = ~10^9 to keep RNG «$100k proof» and it will take > 4 minutes to seed it, or we will end up with «$100 proof» RNG.

So the question is actually sounds like «do we have at least 50 bits of entropy»? Depends on threat model. My threat model doesn't include malicious web app or proactive traffic monitoring. But it includes cryptanalysis of public keys and encrypted blobs with cleartext metadata.

vibornoff commented 10 years ago

NB Sensors API for further improvements of entropy collection on mobile devices.

vibornoff commented 10 years ago

Just had a look to browser support tables.

asmCrypto requires typed arrays so this thread is relevant for IE >= 10.0 and Safari >= 5.1.

crypto.getRandomValues() is available starting from IE 11 and not available at Android Browser, IE Mobile and Opera Mobile.

As for IE 10 and the rest of mobile browsers, all of them have High Resolution Time API.

So I sure that we always have at least 50 bits of entropy from weak_seed.

But still have to check if everything fine from inside web workers.

infinity0 commented 10 years ago

OK, everything looks good at the moment.

You should be careful if you increasing these entropy estimates in the future though. I still think 50 too high, but at least it is only in weak_seed, so we still require 200 more bits from the user. We should aim for "worst-case" lower-bounds rather than "average" estimates.

Several recent SSL attacks have been active attacks, so it's not so infeasible that the attacker can gain knowledge of the high bits of Date.now() (e.g. leaked metadata) and performance.now() (e.g. a new browser session, when opening a link from an email), and perhaps even build up a model of the lower bits (e.g. if they know roughly how long a page takes to load, or if they can observe network traffic).

You also make assumptions about the behaviour of these functions. Math.random() has no guaranteed unpredictableness - an implementation can do whatever it wants, and performance.now() is only guaranteed to be millisecond-accurate: "If [..] unable to provide a time value accurate to a [microsecond], [it] can represent [..] accurate to a millisecond.

vibornoff commented 10 years ago

Ok, you're right, we must not rely on Date.now() at all since it may leak easily. Won't take Math.random() into account for a while also.

Take a look to this gist. I changed PBKDF logic a bit. It's split into 100 passes 1000 iterations each. Looks promising as it is able to collect at least ~60 bits even with ms accuracy.

I clearly understand that 60 bits is too small for real cryptographic applications. All I'm trying to achieve is a good enough seeding for worst-case scenario.

infinity0 commented 10 years ago

Often, I get 0 with millisecond accuracy, because all the samples are either equal to the mean, or outside 3 sigma. (Example timeDiff/1000: [1] + 92x[2] + [3,3,3,3,4,4].) I think this happens with faster computers, and increasing the number of iterations helps to avoid this 0 case. So perhaps you could add some code to determine the number of iterations, based on how long 1000 iterations takes. (Actually, I'm not sure why do you the filtering, especially in this direction. Surely you want the extreme events?)

However, I think it's not worth spending too much effort on this, at least for the purposes of this issue - I think allow_weak being false is a good-enough protection, as it forces the developer to actually collect some entropy. Once they do this, collecting 100 bits or 300 bits shouldn't make much difference to their application, so they might as well collect ~300 bits. I'm happy if you mark this issue as solved.

Collecting entropy like this could be interesting as a separate issue, though.

infinity0 commented 10 years ago

I think some of the logic changes you made in 639d8e9e make the intent of the code less clear. It's not clear that the behaviour is the same as what we previously had. The whole point of the deal with assume_system_rng_strong = true is to make the intention of the logic clear to someone who is reading the code, even if it is not the most efficient way of doing things.

One side effect of your changes is that:

if the system RNG is available, and the app calls Random.getValues, this immediately sets estimate to 256 bits. However, if in a later part of the app he prefers to seed ISAAC, Random.seed no longer gives him feedback on whether he "provided enough entropy", and just always returns true.

I think it's best to treat estimate seeding of ISAAC separately from the system RNG, so (as you mentioned before) the developer has a choice whether to trust the system RNG or not.

vibornoff commented 10 years ago

Oh, thanks alot, I've fixed Random_getNumber too.

asmcrypto / asmcrypto.js

PRNG issues #27