cloudflare / circl

CIRCL: Cloudflare Interoperable Reusable Cryptographic Library
http://blog.cloudflare.com/introducing-circl
Other
1.26k stars 141 forks source link

sha3: prevent state from escaping to heap #282

Closed bwesterb closed 2 years ago

bwesterb commented 3 years ago

Two issues prevented the sha3/shake state from staying on the heap:

1. xorIn and copyOut were virtuals (variables)
2. State contains a pointer (buf) into itself

The solution is simple.

For (1) we set xorIn and copyOut at compile-time, which it already was, but the compiler didn't notice.

For (2) we store the bounds of the original slice buf into storage instead of the slice itself.

Actual performance benefit doesn't show up properly in this microbenchmark as it is seen when there is a lot of pressure on the GC:

name                   old time/op   new time/op   delta
PermutationFunction-8    375ns ± 7%    375ns ± 9%    ~     (p=0.730 n=5+5)
Sha3_512_MTU-8          8.68µs ± 0%   8.10µs ± 3%  -6.74%  (p=0.008 n=5+5)
Sha3_384_MTU-8          6.27µs ± 1%   6.04µs ± 2%  -3.51%  (p=0.008 n=5+5)
Sha3_256_MTU-8          4.99µs ± 0%   4.72µs ± 4%  -5.56%  (p=0.008 n=5+5)
Sha3_224_MTU-8          4.78µs ± 0%   4.36µs ± 2%  -8.73%  (p=0.016 n=4+5)
Shake128_MTU-8          3.62µs ± 4%   3.59µs ± 3%    ~     (p=0.690 n=5+5)
Shake256_MTU-8          3.86µs ± 2%   3.86µs ± 3%    ~     (p=0.690 n=5+5)
Shake256_16x-8          53.9µs ± 2%   56.3µs ± 5%  +4.51%  (p=0.032 n=5+5)
Shake256_1MiB-8         2.96ms ± 2%   2.94ms ± 1%    ~     (p=0.421 n=5+5)
Sha3_512_1MiB-8         5.57ms ± 3%   5.45ms ± 1%    ~     (p=0.056 n=5+5)

name                   old speed     new speed     delta
PermutationFunction-8  533MB/s ± 7%  534MB/s ± 9%    ~     (p=0.841 n=5+5)
Sha3_512_MTU-8         155MB/s ± 0%  167MB/s ± 3%  +7.27%  (p=0.008 n=5+5)
Sha3_384_MTU-8         215MB/s ± 1%  223MB/s ± 2%  +3.66%  (p=0.008 n=5+5)
Sha3_256_MTU-8         270MB/s ± 0%  286MB/s ± 4%  +5.95%  (p=0.008 n=5+5)
Sha3_224_MTU-8         283MB/s ± 0%  310MB/s ± 2%  +9.59%  (p=0.016 n=4+5)
Shake128_MTU-8         373MB/s ± 4%  376MB/s ± 3%    ~     (p=0.690 n=5+5)
Shake256_MTU-8         350MB/s ± 2%  350MB/s ± 3%    ~     (p=0.690 n=5+5)
Shake256_16x-8         304MB/s ± 2%  291MB/s ± 5%  -4.25%  (p=0.032 n=5+5)
Shake256_1MiB-8        354MB/s ± 2%  357MB/s ± 1%    ~     (p=0.421 n=5+5)
Sha3_512_1MiB-8        188MB/s ± 3%  192MB/s ± 1%    ~     (p=0.056 n=5+5)

Closes #281

bwesterb commented 3 years ago

For Dilithium3 (v3.1) it makes a 14% difference:

name    old time/op  new time/op  delta
Sign-8   193µs ± 1%   166µs ± 7%  -13.96%  (p=0.008 n=5+5)
bwesterb commented 3 years ago

Approximately the same improvement for Kyber.

name                                           old time/op  new time/op  delta
GenerateKeyPair/Kyber512-8                     21.3µs ± 1%  19.1µs ± 0%  -10.39%  (p=0.008 n=5+5)
GenerateKeyPair/Kyber768-8                     34.1µs ± 1%  31.3µs ± 0%   -8.29%  (p=0.008 n=5+5)
GenerateKeyPair/Kyber1024-8                    49.5µs ± 1%  46.3µs ± 3%   -6.57%  (p=0.008 n=5+5)
GenerateKeyPair/SIKEp434-8                     2.10ms ± 5%  2.13ms ± 3%     ~     (p=0.421 n=5+5)
GenerateKeyPair/SIKEp503-8                     2.83ms ± 3%  2.84ms ± 1%     ~     (p=1.000 n=5+5)
GenerateKeyPair/SIKEp751-8                     9.11ms ± 2%  9.01ms ± 1%     ~     (p=0.421 n=5+5)
GenerateKeyPair/Kyber512-X25519-8              47.1µs ± 2%  44.6µs ± 0%   -5.26%  (p=0.008 n=5+5)
GenerateKeyPair/Kyber768-X448-8                 132µs ± 0%   130µs ± 2%     ~     (p=0.063 n=4+5)
GenerateKeyPair/Kyber1024-X448-8                147µs ± 0%   143µs ± 1%   -2.37%  (p=0.008 n=5+5)
Encapsulate/Kyber512-8                         16.7µs ± 0%  14.4µs ± 1%  -13.65%  (p=0.008 n=5+5)
Encapsulate/Kyber768-8                         20.5µs ± 1%  18.0µs ± 0%  -12.02%  (p=0.008 n=5+5)
Encapsulate/Kyber1024-8                        25.5µs ± 1%  22.4µs ± 2%  -12.25%  (p=0.008 n=5+5)
Encapsulate/SIKEp434-8                         3.44ms ± 4%  3.42ms ± 4%     ~     (p=0.690 n=5+5)
Encapsulate/SIKEp503-8                         4.59ms ± 3%  4.56ms ± 0%     ~     (p=0.730 n=5+4)
Encapsulate/SIKEp751-8                         15.1ms ± 3%  14.9ms ± 4%     ~     (p=0.548 n=5+5)
Encapsulate/Kyber512-X25519-8                  86.0µs ± 1%  83.7µs ± 0%   -2.65%  (p=0.008 n=5+5)
Encapsulate/Kyber768-X448-8                     282µs ± 0%   279µs ± 0%   -1.07%  (p=0.008 n=5+5)
Encapsulate/Kyber1024-X448-8                    288µs ± 1%   283µs ± 0%   -1.76%  (p=0.008 n=5+5)
Decapsulate/Kyber512-8                         17.4µs ± 1%  14.6µs ± 2%  -16.00%  (p=0.008 n=5+5)
Decapsulate/Kyber768-8                         21.8µs ± 0%  18.6µs ± 2%  -15.00%  (p=0.008 n=5+5)
Decapsulate/Kyber1024-8                        27.5µs ± 0%  22.9µs ± 2%  -16.66%  (p=0.008 n=5+5)
Decapsulate/SIKEp434-8                         5.60ms ± 4%  5.63ms ± 2%     ~     (p=0.841 n=5+5)
Decapsulate/SIKEp503-8                         7.73ms ± 1%  7.71ms ± 2%     ~     (p=1.000 n=5+5)
Decapsulate/SIKEp751-8                         24.9ms ± 1%  24.8ms ± 2%     ~     (p=0.548 n=5+5)
Decapsulate/Kyber512-X25519-8                  57.0µs ± 1%  54.4µs ± 1%   -4.50%  (p=0.008 n=5+5)
Decapsulate/Kyber768-X448-8                     183µs ± 1%   179µs ± 1%   -2.44%  (p=0.008 n=5+5)
Decapsulate/Kyber1024-X448-8                    189µs ± 1%   183µs ± 1%   -3.62%  (p=0.008 n=5+5)