auth0 / node-jwa

JSON Web Algorithms
http://tools.ietf.org/id/draft-ietf-jose-json-web-algorithms-08.html
MIT License
98 stars 42 forks source link

perf: cache the creation of the algorithm #47

Closed H4ad closed 1 month ago

H4ad commented 1 year ago

By submitting a PR to this repository, you agree to the terms within the Auth0 Code of Conduct. Please see the contributing guidelines for how to create and submit a high-quality PR for this repo.

Description

This library is used by node-jws, every time someone wants to validate a token, they call jwa to create the verify function and then discard the object.

Code usage reference

Because of this, I read the code of this library and I found that the object with sign and verify function is created every call and also runs a regex to get the algorithm and the bits.

The current performance is:

jwa(RS256) x 5,825,423 ops/sec ±1.12% (92 runs sampled)
jwa(RS384) x 6,216,852 ops/sec ±0.47% (94 runs sampled)
jwa(RS512) x 6,046,150 ops/sec ±1.25% (89 runs sampled)
jwa(PS256) x 4,306,111 ops/sec ±1.14% (93 runs sampled)
jwa(PS384) x 4,260,252 ops/sec ±1.14% (92 runs sampled)
jwa(PS512) x 3,976,296 ops/sec ±4.58% (87 runs sampled)
jwa(HS256) x 4,295,952 ops/sec ±0.87% (93 runs sampled)
jwa(HS384) x 4,225,687 ops/sec ±1.10% (89 runs sampled)
jwa(HS512) x 4,314,741 ops/sec ±1.32% (91 runs sampled)
jwa(ES256) x 4,166,067 ops/sec ±1.03% (89 runs sampled)
jwa(ES384) x 4,157,053 ops/sec ±1.17% (91 runs sampled)
jwa(ES512) x 4,167,795 ops/sec ±0.91% (90 runs sampled)

So, instead of creating it every time, I cache the objects with Object.freeze to prevent modification, and also use a object with the key being the hashes and the value being the cached objects, now the performance is:

jwa(RS256) x 1,044,750,439 ops/sec ±1.77% (90 runs sampled)
jwa(RS384) x 46,073,595 ops/sec ±2.94% (87 runs sampled)
jwa(RS512) x 48,740,542 ops/sec ±2.92% (87 runs sampled)
jwa(PS256) x 50,445,379 ops/sec ±2.11% (85 runs sampled)
jwa(PS384) x 50,930,005 ops/sec ±5.51% (85 runs sampled)
jwa(PS512) x 55,984,858 ops/sec ±1.34% (93 runs sampled)
jwa(HS256) x 59,485,338 ops/sec ±0.88% (93 runs sampled)
jwa(HS384) x 61,521,893 ops/sec ±0.90% (88 runs sampled)
jwa(HS512) x 62,314,092 ops/sec ±2.71% (86 runs sampled)
jwa(ES256) x 42,380,646 ops/sec ±3.55% (61 runs sampled)
jwa(ES384) x 40,491,232 ops/sec ±1.25% (90 runs sampled)
jwa(ES512) x 42,010,686 ops/sec ±1.52% (91 runs sampled)

Is an increase in the performance of almost 10x for all cases, also, we reduce to zero the garbage collection by reusing instead of allocating.

More about memory usage > I'm using [isitfast](https://github.com/yamiteru/isitfast), ignore the op/s which is not currently stable. Before: ``` jwa(RS512) 1,845,018 op/s (542 ns) ±1% x2,500 | 248 kB ±2% x25 jwa(RS384) 6,211,180 op/s (161 ns) ±1% x2,500 | 232 kB ±2% x25 jwa(RS256) 12,345,679 op/s (81 ns) ±1% x2,500 | 232 kB ±2% x25 jwa(PS512) 1,173,709 op/s (852 ns) ±1% x2,500 | 360 kB ±1% x25 jwa(PS384) 4,524,887 op/s (221 ns) ±1% x2,500 | 360 kB ±1% x25 jwa(PS256) 2,314,815 op/s (432 ns) ±1% x2,500 | 360 kB ±1% x25 jwa(HS512) 4,761,905 op/s (210 ns) ±1% x2,500 | 360 kB ±1% x25 jwa(HS384) 2,169,197 op/s (461 ns) ±1% x2,500 | 360 kB ±1% x25 jwa(HS256) 6,211,180 op/s (161 ns) ±1% x2,500 | 320 kB ±2% x25 jwa(ES512) 5,000,000 op/s (200 ns) ±1% x2,500 | 544 kB ±0.9% x25 jwa(ES384) 4,975,124 op/s (201 ns) ±1% x2,500 | 544 kB ±0.9% x25 jwa(ES256) 6,211,180 op/s (161 ns) ±1% x2,500 | 544 kB ±0.9% x25 ``` Now: ``` jwa(RS512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(RS384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(RS256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(PS512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(PS384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(PS256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(HS512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(HS384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(HS256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(ES512) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(ES384) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 jwa(ES256) ∞ op/s (0 ns) ±0% x2,500 | 0 kB ±0% x25 ```

The RS256 with 1B op/s probably is caused by optimization of v8 bail-out after discovering the function could receive values other than RS256.

Testing

I didn't change the behavior, I only introduce a cache and freeze of the objects.

Object.freeze was introduced on NodeJS v0.10.0, so I don't think we will have some compatibility issues.

Checklist