Closed bwesterb closed 2 years ago
For Dilithium3 (v3.1) it makes a 14% difference:
name old time/op new time/op delta
Sign-8 193µs ± 1% 166µs ± 7% -13.96% (p=0.008 n=5+5)
Approximately the same improvement for Kyber.
name old time/op new time/op delta
GenerateKeyPair/Kyber512-8 21.3µs ± 1% 19.1µs ± 0% -10.39% (p=0.008 n=5+5)
GenerateKeyPair/Kyber768-8 34.1µs ± 1% 31.3µs ± 0% -8.29% (p=0.008 n=5+5)
GenerateKeyPair/Kyber1024-8 49.5µs ± 1% 46.3µs ± 3% -6.57% (p=0.008 n=5+5)
GenerateKeyPair/SIKEp434-8 2.10ms ± 5% 2.13ms ± 3% ~ (p=0.421 n=5+5)
GenerateKeyPair/SIKEp503-8 2.83ms ± 3% 2.84ms ± 1% ~ (p=1.000 n=5+5)
GenerateKeyPair/SIKEp751-8 9.11ms ± 2% 9.01ms ± 1% ~ (p=0.421 n=5+5)
GenerateKeyPair/Kyber512-X25519-8 47.1µs ± 2% 44.6µs ± 0% -5.26% (p=0.008 n=5+5)
GenerateKeyPair/Kyber768-X448-8 132µs ± 0% 130µs ± 2% ~ (p=0.063 n=4+5)
GenerateKeyPair/Kyber1024-X448-8 147µs ± 0% 143µs ± 1% -2.37% (p=0.008 n=5+5)
Encapsulate/Kyber512-8 16.7µs ± 0% 14.4µs ± 1% -13.65% (p=0.008 n=5+5)
Encapsulate/Kyber768-8 20.5µs ± 1% 18.0µs ± 0% -12.02% (p=0.008 n=5+5)
Encapsulate/Kyber1024-8 25.5µs ± 1% 22.4µs ± 2% -12.25% (p=0.008 n=5+5)
Encapsulate/SIKEp434-8 3.44ms ± 4% 3.42ms ± 4% ~ (p=0.690 n=5+5)
Encapsulate/SIKEp503-8 4.59ms ± 3% 4.56ms ± 0% ~ (p=0.730 n=5+4)
Encapsulate/SIKEp751-8 15.1ms ± 3% 14.9ms ± 4% ~ (p=0.548 n=5+5)
Encapsulate/Kyber512-X25519-8 86.0µs ± 1% 83.7µs ± 0% -2.65% (p=0.008 n=5+5)
Encapsulate/Kyber768-X448-8 282µs ± 0% 279µs ± 0% -1.07% (p=0.008 n=5+5)
Encapsulate/Kyber1024-X448-8 288µs ± 1% 283µs ± 0% -1.76% (p=0.008 n=5+5)
Decapsulate/Kyber512-8 17.4µs ± 1% 14.6µs ± 2% -16.00% (p=0.008 n=5+5)
Decapsulate/Kyber768-8 21.8µs ± 0% 18.6µs ± 2% -15.00% (p=0.008 n=5+5)
Decapsulate/Kyber1024-8 27.5µs ± 0% 22.9µs ± 2% -16.66% (p=0.008 n=5+5)
Decapsulate/SIKEp434-8 5.60ms ± 4% 5.63ms ± 2% ~ (p=0.841 n=5+5)
Decapsulate/SIKEp503-8 7.73ms ± 1% 7.71ms ± 2% ~ (p=1.000 n=5+5)
Decapsulate/SIKEp751-8 24.9ms ± 1% 24.8ms ± 2% ~ (p=0.548 n=5+5)
Decapsulate/Kyber512-X25519-8 57.0µs ± 1% 54.4µs ± 1% -4.50% (p=0.008 n=5+5)
Decapsulate/Kyber768-X448-8 183µs ± 1% 179µs ± 1% -2.44% (p=0.008 n=5+5)
Decapsulate/Kyber1024-X448-8 189µs ± 1% 183µs ± 1% -3.62% (p=0.008 n=5+5)
Two issues prevented the sha3/shake state from staying on the heap:
The solution is simple.
For (1) we set xorIn and copyOut at compile-time, which it already was, but the compiler didn't notice.
For (2) we store the bounds of the original slice buf into storage instead of the slice itself.
Actual performance benefit doesn't show up properly in this microbenchmark as it is seen when there is a lot of pressure on the GC:
Closes #281