Closed emmorais closed 3 months ago
Review notes:
- Just using the new wrapper may affect performance, since we have better degree management.
- Examine in detail how poly-split functions are implemented, again rust knowledge is helpful here, because we would like to reinterpret slices of the polynomials as polynomials themselves.
I think some of these errors will go away once the conflicts in this PR are resolved. That will pick up some missing implementations.
It also looks like we’ll need to merge PR #33, then use some of its methods here.
If this PR is too big, you could split the changes to the recursive and flat implementations. But that's totally optional and your decision. Whatever works best for you.
- Examine in detail how poly-split functions are implemented, again rust knowledge is helpful here, because we would like to reinterpret slices of the polynomials as polynomials themselves.
This isn't possible, because the DensePolynomial
type requires its inner data to be a Vec
(not a slice or an array).
But memory copies are cheap and sometimes skipped by the Rust compiler, if it's clear they aren't needed. We can make it clear by creating small functions with simple code.
@emmorais feel free to fix, review and merge this PR, your parts look good to me.
Well, that bug-hunting was not fun. There were a bunch of untested bugs in my PR #33 that only showed up in this one.
I want to make smaller PRs going forward, so it's easier to find bugs.
It looks like the rewritten recursive implementation is much faster, so I've switched the Poly
type over to it.
On my M1 (ARM) Mac, stable Rust:
Benchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N: WBenchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N: Collecting 50 samCyclotomic multiplication: polynomial/2 random polys of degree N
time: [54.442 ms 54.572 ms 54.714 ms]
change: [-2.5709% -1.5972% -0.7933%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 50 measurements (20.00%)
4 (8.00%) high mild
6 (12.00%) high severe
Benchmarking Karatsuba multiplication: polynomial/2 random polys of degree N: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 8.7s, enable flat sampling, or reduce sample count to 20.
Karatsuba multiplication: polynomial/2 random polys of degree N
time: [6.7853 ms 6.8005 ms 6.8192 ms]
change: [-9.5455% -9.2081% -8.8366%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
4 (8.00%) high mild
1 (2.00%) high severe
Flat Karatsuba multiplication: polynomial/2 random polys of degree N
time: [27.659 ms 27.709 ms 27.770 ms]
change: [-3.5473% -2.6540% -2.0606%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
2 (4.00%) high mild
3 (6.00%) high severe
On my Intel Linux, stable Rust:
Cyclotomic multiplication: polynomial/2 random polys of degree N
time: [54.231 ms 54.341 ms 54.515 ms]
Found 4 outliers among 50 measurements (8.00%)
4 (8.00%) high severe
Karatsuba multiplication: polynomial/2 random polys of degree N
time: [8.2924 ms 8.3006 ms 8.3102 ms]
Found 11 outliers among 50 measurements (22.00%)
5 (10.00%) high mild
6 (12.00%) high severe
Flat Karatsuba multiplication: polynomial/2 random polys of degree N
time: [27.975 ms 28.026 ms 28.098 ms]
Found 2 outliers among 50 measurements (4.00%)
2 (4.00%) high severe
On GitHub CI, stable Rust (unreliable timings due to resource sharing):
On my Intel Linux, nightly Rust, with extra debugging on:
Now we have a flat implementation of the recursive version of Karatsuba, which allows us to parallelize each layer computation.
Closes #9