dusk-network / dusk-crypto

Cryptographic primitives created for the Dusk Network ecosystem but widely applicable everywhere else
https://dusk.network
12 stars 9 forks source link

Parallelize aggregation of BLS signatures #14

Open jules opened 5 years ago

jules commented 5 years ago

The BLS package currently only allows for the aggregation of two signatures at a time - leaving any caller to loop through any array of signatures, and having to do it sequentially. Parallelizing this would save callers a lot of time.

autholykos commented 5 years ago

Following are a first benchmark of naive parallelization with 2/4/8/12 go routines. TL;DR best setup seems to be run 4 workers that operates the gfP.Add operations concurrently.

HW specs

16GiB System Memory
256KiB L1 cache
1MiB L2 cache
8MiB L3 cache
Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

Sequential signature aggregation

Variance over 10 runs: 44.9µs ± 0%

BenchmarkAggregate100Signatures-8          30000             45001 ns/op               0 B/op          0 allocs/op

Concurrent Signature aggregation with 2 workers

Compared to sequential, delta: -23.60% Variance over 10 runs: 38.9µs ± 3%

BenchmarkAggregate100Signatures-8          50000             34382 ns/op              64 B/op          3 allocs/op

Concurrent Signature aggregation with 4 workers

Compared to sequential, delta: -32.5% Variance over 10 runs: 31.2µs ± 2%

BenchmarkAggregate100Signatures-8          50000             30667 ns/op              80 B/op          3 all

Concurrent Signature aggregation with 8 workers

Compared to sequential, delta: -13.15% Variance over 10 runs: 38.1µs ± 1%

BenchmarkAggregate100Signatures-8          50000             39082 ns/op             240 B/op          4 allocs/op

Concurrent Signature Aggregation with 12 workers

Compared to sequential, delta: -9.07% Variance over 10 runs: 41.5µs ± 1%

BenchmarkAggregate100Signatures-8          30000             40919 ns/op             336 B/op          4 allocs/op
autholykos commented 5 years ago

The benchmarks show a significant improvement in performances (over 30% speedup) by introducing a light parallelization of bls Signature aggregation. However, it would be preferrable goroutine-based optimization to be operated within the user of the library to not impose any scheduling side effect on the usage of the BLS library