Benchmarking: Forge is much slower than SJCL?

kspearrin commented 7 years ago

I'm writing a simple benchmark to test the speed of Forge vs our current solution, SJCL. Using AES256 CBC to encrypt a UTF word 10k times. With SJCL this takes 195ms. With Forge, the same test is taking 2500ms. Am I doing something wrong? Your benchmarks seem to indicate that Forge is much faster than SJCL.

Forge: https://jsfiddle.net/nfwpuhdx/4/ SJCL: https://jsfiddle.net/fjmr2abq/1/

(see console for output)

mattcollier commented 7 years ago

A quick look at the Chrome profiler indicates that forge is spending a lot of time in: ctx.generateSync https://github.com/digitalbazaar/forge/blob/57ebd54a01b4c38399229a7e7cccbd9a9e73f969/lib/prng.js#L136

Which is called by forge.random.getBytesSync: https://github.com/digitalbazaar/forge/blob/57ebd54a01b4c38399229a7e7cccbd9a9e73f969/lib/random.js#L95-L110

And indeed, when the following line from your demo is moved in front the for loop, the encryption is performs as expected.

var ivBytes = forge.random.getBytesSync(16);

So, the performance difference seems to be related to the PRNG implementations.

kspearrin commented 7 years ago

Updated benchmark with decrypt added as well. Forge seems to do better with decrypt since I guess there is no PRNG going on:

Forge: https://jsfiddle.net/nfwpuhdx/8/ SJCL: https://jsfiddle.net/fjmr2abq/7/

mattcollier commented 7 years ago

I implemented a Web Crypto RNG which should isolate the encryption functions.

Forge: https://jsfiddle.net/nfwpuhdx/9/ SJCL: https://jsfiddle.net/fjmr2abq/8/

Forge appears to be ~20% faster.

kspearrin commented 7 years ago

Great! Any reason that's not just baked into forge.random.getBytesSync already?

mattcollier commented 7 years ago

Forge does appear to use the Web Crypto API under certain circumstances. Not sure how this fits in with the demo you've created.

https://github.com/digitalbazaar/forge/blob/57ebd54a01b4c38399229a7e7cccbd9a9e73f969/lib/prng.js#L252

mattcollier commented 7 years ago

Here's a further refinement of the Web Crypto RNG for Forge which also happens to be more similar to the SJCL implementation.

https://jsfiddle.net/nfwpuhdx/10/

kspearrin commented 7 years ago

@mattcollier I didn't really see any difference between version 9 and version 10 as far as performance. Were you expecting there to be?

mattcollier commented 7 years ago

No, no performance benefit, but it is a better match to what is implemented in the SJCL demo for comparison purposes.

So, after studying the PRNG implementation in forge I have found that on each call to forge.random.getBytesSync() forge is reseeding itself which involves a lot of overhead, including generating 1K of entropy and distributing it into pools. This process does actually utilize the Web Crypto API, when available, to fill the entropy pool.

On my system, 10K iterations of just forge.random.getBytesSync(16) takes 3000ms.

I found that the reseeding on every request is occurring because of this: https://github.com/digitalbazaar/forge/blob/master/lib/prng.js#L144

Which was part of this commit: https://github.com/digitalbazaar/forge/commit/88beb1f46777bbca0d86cc7405be7ae8ec2b1593

I have found that removing that one line reduces the runtime for 10K iterations of forge.random.getBytesSync(16) to 75ms.

dlongley commented 7 years ago

By default, forge follows what the Fortuna algorithm calls for: that the key is to be reset for every request, however small. That may be a bit too paranoid when considering performance. Forge implements Fortuna at the JS level even if a secure native PRNG is available. It only uses the native PRNG to provide the seed entropy. Fortuna ensures we always have a sufficient quantity of pseudorandom data to draw from.

We could perhaps change the underlying implementation to rely more directly on secure native PRNGs, but as I recall, some browsers quickly start throwing QuotaExceeded exceptions as they don't always implement something like Fortuna to provide sufficient pseudorandom data.

dlongley commented 7 years ago

In any event, thanks @mattcollier for looking into this and determining the difference in this particular benchmark was actually in the PRNG implementation. We do need to build more benchmarking tests into the test suite to keep an eye on performance.

kspearrin commented 7 years ago

I'm trying to understand why SJCL is doing different then since their implementation seems to be based on Fortuna as well. They do mention some changes in the documentation at the top.

Would it be safe to use window.crypto.getRandomValues() for IVs like in @mattcollier's examples for this demo implementation?

dlongley commented 7 years ago

@kspearrin, yeah, I think that would be safe. Of course, every browser is different so each PRNG may have different quality, but if a browser implements that API, it is supposed to be a secure PRNG.

dlongley commented 7 years ago

@kspearrin,

Looks like SJCL changes the key by using its own generated pseudorandom bytes instead of doing a full reseed:

https://github.com/bitwiseshiftleft/sjcl/blob/00bd42dae9ebc3a28708db2053ffa2e88f99ad1e/core/random.js#L326-L343

I imagine that if we did the same it would eliminate the large difference.

kspearrin commented 7 years ago

Is that a reasonable optimization that could be added to forge?

dlongley commented 7 years ago

@kspearrin,

I think so. When rekeying we need to produce the key material from a source that is assumed to behave like it's random, so we should be able to reuse the output of the block cipher itself, provided that we still reseed every 1MiB of data. I believe reseeding even on that boundary is only open to a theoretical attack, but we should continue to do it unless we can get input from a reputable cryptographer that says it's not necessary.

So the optimization we would add would move the slowdown from every request to 1MiB boundaries.

kspearrin commented 7 years ago

Great. Please let me know when/if this makes its way into the base Forge library. Until then I'll continue to evaluate directly using window.crypto.getRandomValues() in its place.

digitalbazaar / forge

Benchmarking: Forge is much slower than SJCL? #470