cloudflare / circl

CIRCL: Cloudflare Interoperable Reusable Cryptographic Library
http://blog.cloudflare.com/introducing-circl
Other
1.22k stars 136 forks source link

Kangaroo12 draft -10 #431

Closed bwesterb closed 1 year ago

bwesterb commented 1 year ago

On M2 Pro:

goos: darwin
goarch: arm64
pkg: github.com/cloudflare/circl/xof/k12
BenchmarkK12_100B-12         5237684           225.1 ns/op   444.26 MB/s
BenchmarkK12_10K-12           106761         11281 ns/op     886.45 MB/s
BenchmarkK12_100K-12           26758         44659 ns/op    2239.19 MB/s
BenchmarkK12_1M-12              3723        324025 ns/op    3086.18 MB/s
BenchmarkK12_10M-12              384       3107586 ns/op    3217.93 MB/s
PASS
ok      github.com/cloudflare/circl/xof/k12 7.254s

For comparison:

goos: darwin
goarch: arm64
pkg: github.com/cloudflare/circl/internal/sha3
BenchmarkPermutationFunctionTurbo-12         7788294           145.3 ns/op  1376.86 MB/s
BenchmarkTurboShake128_1MiB-12                  1159       1027718 ns/op    1020.29 MB/s
BenchmarkTurboShake256_1MiB-12                   945       1255741 ns/op     835.03 MB/s
PASS
ok      github.com/cloudflare/circl/internal/sha3   4.068s
bwesterb commented 1 year ago

Without overhead, with the current code that uses the slow scalar keccak, we should be able to reach:

>>> 42 * 1024 / (145e-9 + 128*72e-9) / 1e6
4594.380942207029

Using pprof, I see 1/3 of the time is spent in writeX2/LittleEndian.Uint64: interleaving and XORing the data into the buffers isn't free.

bwesterb commented 1 year ago
goos: darwin
goarch: amd64
pkg: github.com/cloudflare/circl/xof/k12
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkK12_100B-8      3521196           348.1 ns/op   287.26 MB/s
BenchmarkK12_10K-8         63129         18746 ns/op     533.45 MB/s
BenchmarkK12_100K-8        13159         90725 ns/op    1102.23 MB/s
BenchmarkK12_1M-8           2001        591119 ns/op    1691.71 MB/s
BenchmarkK12_10M-8           206       5876949 ns/op    1701.56 MB/s
PASS
ok      github.com/cloudflare/circl/xof/k12 8.344s

Theoretical max is 2230 MB/s. Interleaving and XORing is still expensive, it's just less so compared to the speed of Keccak here.

bwesterb commented 1 year ago

@armfazh I addressed all your comments. Please have another look.