Open kspearrin opened 7 years ago
A quick look at the Chrome profiler indicates that forge is spending a lot of time in: ctx.generateSync
https://github.com/digitalbazaar/forge/blob/57ebd54a01b4c38399229a7e7cccbd9a9e73f969/lib/prng.js#L136
Which is called by forge.random.getBytesSync
: https://github.com/digitalbazaar/forge/blob/57ebd54a01b4c38399229a7e7cccbd9a9e73f969/lib/random.js#L95-L110
And indeed, when the following line from your demo is moved in front the for
loop, the encryption is performs as expected.
var ivBytes = forge.random.getBytesSync(16);
So, the performance difference seems to be related to the PRNG implementations.
Updated benchmark with decrypt added as well. Forge seems to do better with decrypt since I guess there is no PRNG going on:
Forge: https://jsfiddle.net/nfwpuhdx/8/ SJCL: https://jsfiddle.net/fjmr2abq/7/
I implemented a Web Crypto RNG which should isolate the encryption functions.
Forge: https://jsfiddle.net/nfwpuhdx/9/ SJCL: https://jsfiddle.net/fjmr2abq/8/
Forge appears to be ~20% faster.
Great! Any reason that's not just baked into forge.random.getBytesSync
already?
Forge does appear to use the Web Crypto API under certain circumstances. Not sure how this fits in with the demo you've created.
Here's a further refinement of the Web Crypto RNG for Forge which also happens to be more similar to the SJCL implementation.
@mattcollier I didn't really see any difference between version 9 and version 10 as far as performance. Were you expecting there to be?
No, no performance benefit, but it is a better match to what is implemented in the SJCL demo for comparison purposes.
So, after studying the PRNG implementation in forge I have found that on each call to forge.random.getBytesSync()
forge is reseeding itself which involves a lot of overhead, including generating 1K of entropy and distributing it into pools. This process does actually utilize the Web Crypto API, when available, to fill the entropy pool.
On my system, 10K iterations of just forge.random.getBytesSync(16)
takes 3000ms.
I found that the reseeding on every request is occurring because of this: https://github.com/digitalbazaar/forge/blob/master/lib/prng.js#L144
Which was part of this commit: https://github.com/digitalbazaar/forge/commit/88beb1f46777bbca0d86cc7405be7ae8ec2b1593
I have found that removing that one line reduces the runtime for 10K iterations of forge.random.getBytesSync(16)
to 75ms.
By default, forge follows what the Fortuna algorithm calls for: that the key is to be reset for every request, however small. That may be a bit too paranoid when considering performance. Forge implements Fortuna at the JS level even if a secure native PRNG is available. It only uses the native PRNG to provide the seed entropy. Fortuna ensures we always have a sufficient quantity of pseudorandom data to draw from.
We could perhaps change the underlying implementation to rely more directly on secure native PRNGs, but as I recall, some browsers quickly start throwing QuotaExceeded exceptions as they don't always implement something like Fortuna to provide sufficient pseudorandom data.
In any event, thanks @mattcollier for looking into this and determining the difference in this particular benchmark was actually in the PRNG implementation. We do need to build more benchmarking tests into the test suite to keep an eye on performance.
I'm trying to understand why SJCL is doing different then since their implementation seems to be based on Fortuna as well. They do mention some changes in the documentation at the top.
Would it be safe to use window.crypto.getRandomValues()
for IVs like in @mattcollier's examples for this demo implementation?
@kspearrin, yeah, I think that would be safe. Of course, every browser is different so each PRNG may have different quality, but if a browser implements that API, it is supposed to be a secure PRNG.
@kspearrin,
Looks like SJCL changes the key by using its own generated pseudorandom bytes instead of doing a full reseed:
I imagine that if we did the same it would eliminate the large difference.
Is that a reasonable optimization that could be added to forge?
@kspearrin,
I think so. When rekeying we need to produce the key material from a source that is assumed to behave like it's random, so we should be able to reuse the output of the block cipher itself, provided that we still reseed every 1MiB of data. I believe reseeding even on that boundary is only open to a theoretical attack, but we should continue to do it unless we can get input from a reputable cryptographer that says it's not necessary.
So the optimization we would add would move the slowdown from every request to 1MiB boundaries.
Great. Please let me know when/if this makes its way into the base Forge library. Until then I'll continue to evaluate directly using window.crypto.getRandomValues()
in its place.
I'm writing a simple benchmark to test the speed of Forge vs our current solution, SJCL. Using AES256 CBC to encrypt a UTF word 10k times. With SJCL this takes 195ms. With Forge, the same test is taking 2500ms. Am I doing something wrong? Your benchmarks seem to indicate that Forge is much faster than SJCL.
Forge: https://jsfiddle.net/nfwpuhdx/4/ SJCL: https://jsfiddle.net/fjmr2abq/1/
(see console for output)