crate-crypto / go-ipa

A Go implementation of cryptographic primitives for Verkle Trees
https://verkle.dev
Apache License 2.0
32 stars 14 forks source link

Avoid allocs improving ~15% performance and using ~97% less memory #18

Closed jsign closed 2 years ago

jsign commented 2 years ago

This PR contains a set of optimizations that improves the performance of the library significantly.

Using the reference benchmarks in the repo, here's a comparison in performance between master and this PR:

name                           old time/op    new time/op    delta
pkg:github.com/crate-crypto/go-ipa/bandersnatch goos:linux goarch:amd64
MultiExpG1/32_points-16           318µs ± 6%     355µs ±11%   +11.81%  (p=0.000 n=10+8)
MultiExpG1/64_points-16           354µs ± 5%     287µs ± 4%   -18.69%  (p=0.000 n=10+7)
MultiExpG1/128_points-16          450µs ± 5%     381µs ± 6%   -15.31%  (p=0.000 n=10+8)
MultiExpG1/256_points-16          649µs ± 4%     556µs ± 5%   -14.31%  (p=0.000 n=10+8)
MultiExpG1/512_points-16         1.04ms ± 4%    0.87ms ± 2%   -16.34%  (p=0.000 n=9+8)
MultiExpG1/1024_points-16        1.63ms ± 7%    1.30ms ± 3%   -20.39%  (p=0.000 n=10+8)
MultiExpG1/2048_points-16        2.85ms ± 3%    2.40ms ± 1%   -15.69%  (p=0.000 n=10+7)
MultiExpG1/4096_points-16        4.98ms ± 2%    4.17ms ± 1%   -16.37%  (p=0.000 n=10+8)
MultiExpG1/8192_points-16        9.49ms ± 0%    8.06ms ± 0%   -15.09%  (p=0.000 n=10+7)
MultiExpG1/16384_points-16       18.7ms ± 1%    15.7ms ± 1%   -15.83%  (p=0.000 n=10+8)
MultiExpG1/32768_points-16       36.0ms ± 2%    30.8ms ± 2%   -14.45%  (p=0.000 n=10+8)
MultiExpG1/65536_points-16       64.1ms ± 3%    54.3ms ± 2%   -15.25%  (p=0.000 n=10+7)
MultiExpG1/131072_points-16       109ms ± 3%      93ms ± 2%   -14.43%  (p=0.000 n=10+8)
MultiExpG1/262144_points-16       192ms ± 2%     164ms ± 7%   -14.70%  (p=0.000 n=10+8)
MultiExpG1/524288_points-16       381ms ± 2%     321ms ± 2%   -15.75%  (p=0.000 n=10+7)
MultiExpG1/1048576_points-16      722ms ± 3%     605ms ± 2%   -16.21%  (p=0.000 n=10+8)
MultiExpG1/2097152_points-16      1.41s ± 2%     1.19s ± 1%   -15.71%  (p=0.000 n=10+8)
MultiExpG1/4194304_points-16      2.82s ± 1%     2.38s ± 1%   -15.57%  (p=0.000 n=10+8)
MultiExpG1/8388608_points-16      5.69s ± 0%     5.04s ± 1%   -11.41%  (p=0.000 n=9+7)
MultiExpG1/16777216_points-16     10.6s ± 1%      9.2s ± 0%   -12.93%  (p=0.000 n=10+8)
MultiExpG1Reference-16            741ms ± 4%     608ms ± 0%   -18.04%  (p=0.000 n=9+6)
ManyMultiExpG1Reference-16        2.18s ± 2%     1.81s ± 1%   -17.23%  (p=0.000 n=10+8)

name                           old alloc/op   new alloc/op   delta
pkg:github.com/crate-crypto/go-ipa/bandersnatch goos:linux goarch:amd64
MultiExpG1/32_points-16           231kB ± 0%      18kB ± 0%   -92.38%  (p=0.000 n=9+8)
MultiExpG1/64_points-16           347kB ± 0%      21kB ± 0%   -94.06%  (p=0.000 n=8+8)
MultiExpG1/128_points-16          551kB ± 0%      23kB ± 0%   -95.89%  (p=0.000 n=9+8)
MultiExpG1/256_points-16          905kB ± 0%      24kB ± 0%   -97.37%  (p=0.000 n=10+8)
MultiExpG1/512_points-16         1.54MB ± 0%    0.03MB ± 0%   -98.06%  (p=0.000 n=10+8)
MultiExpG1/1024_points-16        2.67MB ± 0%    0.04MB ± 0%   -98.33%  (p=0.000 n=9+8)
MultiExpG1/2048_points-16        4.77MB ± 0%    0.08MB ± 0%   -98.40%  (p=0.000 n=10+8)
MultiExpG1/4096_points-16        8.59MB ± 0%    0.14MB ± 0%   -98.35%  (p=0.000 n=8+8)
MultiExpG1/8192_points-16        15.5MB ± 0%     0.3MB ± 0%   -98.25%  (p=0.000 n=10+8)
MultiExpG1/16384_points-16       28.1MB ± 0%     0.5MB ± 0%   -98.10%  (p=0.000 n=9+8)
MultiExpG1/32768_points-16       52.1MB ± 0%     1.1MB ± 0%   -97.97%  (p=0.000 n=10+8)
MultiExpG1/65536_points-16       96.0MB ± 0%     2.1MB ± 0%   -97.81%  (p=0.000 n=10+8)
MultiExpG1/131072_points-16       180MB ± 0%       4MB ± 0%   -97.67%  (p=0.000 n=10+8)
MultiExpG1/262144_points-16       329MB ± 0%       8MB ± 0%   -97.45%  (p=0.000 n=10+7)
MultiExpG1/524288_points-16       621MB ± 0%      17MB ± 0%   -97.30%  (p=0.000 n=10+8)
MultiExpG1/1048576_points-16     1.17GB ± 0%    0.03GB ± 0%   -97.14%  (p=0.000 n=10+8)
MultiExpG1/2097152_points-16     2.28GB ± 0%    0.07GB ± 0%   -97.06%  (p=0.000 n=10+8)
MultiExpG1/4194304_points-16     4.56GB ± 0%    0.13GB ± 0%   -97.06%  (p=0.000 n=8+8)
MultiExpG1/8388608_points-16     10.1GB ± 0%     1.5GB ± 0%   -85.35%  (p=0.000 n=9+8)
MultiExpG1/16777216_points-16    17.3GB ± 0%     1.7GB ± 0%   -89.93%  (p=0.000 n=10+8)
MultiExpG1Reference-16           1.17GB ± 0%    0.03GB ± 0%   -97.14%  (p=0.000 n=10+7)
ManyMultiExpG1Reference-16       3.52GB ± 0%    0.10GB ± 0%   -97.14%  (p=0.000 n=8+8)

name                           old allocs/op  new allocs/op  delta
pkg:github.com/crate-crypto/go-ipa/bandersnatch goos:linux goarch:amd64
MultiExpG1/32_points-16           3.39k ± 0%     0.09k ± 0%   -97.38%  (p=0.000 n=10+8)
MultiExpG1/64_points-16           5.22k ± 0%     0.13k ± 0%   -97.51%  (p=0.000 n=10+8)
MultiExpG1/128_points-16          8.38k ± 0%     0.13k ± 0%   -98.45%  (p=0.000 n=10+8)
MultiExpG1/256_points-16          13.9k ± 0%      0.1k ± 0%   -99.19%  (p=0.000 n=10+8)
MultiExpG1/512_points-16          23.7k ± 0%      0.1k ± 0%   -99.58%  (p=0.000 n=10+8)
MultiExpG1/1024_points-16         41.2k ± 0%      0.1k ± 0%   -99.78%  (p=0.000 n=10+8)
MultiExpG1/2048_points-16         73.4k ± 0%      0.1k ± 0%   -99.89%  (p=0.000 n=10+8)
MultiExpG1/4096_points-16          132k ± 0%        0k ± 0%   -99.94%  (p=0.000 n=10+8)
MultiExpG1/8192_points-16          238k ± 0%        0k ± 0%   -99.97%  (p=0.000 n=10+8)
MultiExpG1/16384_points-16         431k ± 0%        0k ± 0%   -99.98%  (p=0.000 n=10+8)
MultiExpG1/32768_points-16         798k ± 0%        0k ± 0%   -99.99%  (p=0.000 n=10+8)
MultiExpG1/65536_points-16        1.47M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/131072_points-16       2.75M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/262144_points-16       5.01M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/524288_points-16       9.44M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/1048576_points-16      17.8M ± 0%      0.0M ± 0%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/2097152_points-16      34.6M ± 0%      0.0M ± 1%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/4194304_points-16      69.2M ± 0%      0.0M ± 1%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/8388608_points-16       134M ± 0%        0M ± 1%  -100.00%  (p=0.000 n=10+8)
MultiExpG1/16777216_points-16      243M ± 0%        0M ± 3%  -100.00%  (p=0.000 n=10+7)
MultiExpG1Reference-16            17.8M ± 0%      0.0M ± 0%  -100.00%  (p=0.000 n=10+8)
ManyMultiExpG1Reference-16        53.5M ± 0%      0.0M ± 2%  -100.00%  (p=0.000 n=9+8)

A TL;DR:

The original focus to guide the optimizations was making go-verkle faster for generating and verifying proofs.

To understand the impact of these optimizations in go-verkle see this PR.

I'm not entirely sure if the repo is open for PRs from external contributors; if this PR is looking good to consider for merging I can add some comments on the changes.

Let me know!

kevaundray commented 2 years ago

These are amazing results! Thank you very much, I'll check out the PR now :)

kevaundray commented 2 years ago

Oh RE your last sentence, yes please add comments on the changes!

kevaundray commented 2 years ago

I should also note that #19 might regress some of the changes made here. Instead of PointAffine, we use a wrapper around PointProj with special methods, so MultiExp etc now also use that wrapper

jsign commented 2 years ago

These are amazing results! Thank you very much, I'll check out the PR now :)

You're welcome!

I should also note that https://github.com/crate-crypto/go-ipa/pull/19 might regress some of the changes made here. Instead of PointAffine, we use a wrapper around PointProj with special methods, so MultiExp etc now also use that wrapper

Sounds good. Maybe after that gets merged I can take some other pass with optimization-eyes to see if there's something extra that we can gain. Let me know how you want to proceed

kevaundray commented 2 years ago

Alright, I'll merge this first and then resolve conflicts on that branch, aiming to preserve the optimisations you have done