Open jules opened 5 years ago
Following are a first benchmark of naive parallelization with 2/4/8/12 go routines.
TL;DR best setup seems to be run 4 workers that operates the gfP.Add
operations concurrently.
16GiB System Memory
256KiB L1 cache
1MiB L2 cache
8MiB L3 cache
Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Variance over 10 runs: 44.9µs ± 0%
BenchmarkAggregate100Signatures-8 30000 45001 ns/op 0 B/op 0 allocs/op
Compared to sequential, delta: -23.60% Variance over 10 runs: 38.9µs ± 3%
BenchmarkAggregate100Signatures-8 50000 34382 ns/op 64 B/op 3 allocs/op
Compared to sequential, delta: -32.5% Variance over 10 runs: 31.2µs ± 2%
BenchmarkAggregate100Signatures-8 50000 30667 ns/op 80 B/op 3 all
Compared to sequential, delta: -13.15% Variance over 10 runs: 38.1µs ± 1%
BenchmarkAggregate100Signatures-8 50000 39082 ns/op 240 B/op 4 allocs/op
Compared to sequential, delta: -9.07% Variance over 10 runs: 41.5µs ± 1%
BenchmarkAggregate100Signatures-8 30000 40919 ns/op 336 B/op 4 allocs/op
The benchmarks show a significant improvement in performances (over 30% speedup) by introducing a light parallelization of bls Signature aggregation. However, it would be preferrable goroutine-based optimization to be operated within the user of the library to not impose any scheduling side effect on the usage of the BLS library
The BLS package currently only allows for the aggregation of two signatures at a time - leaving any caller to loop through any array of signatures, and having to do it sequentially. Parallelizing this would save callers a lot of time.