Inversed-Tech / eyelid

Private iris matching
Apache License 2.0
0 stars 0 forks source link

Flat Karatsuba optimization #32

Closed emmorais closed 3 months ago

emmorais commented 3 months ago

Now we have a flat implementation of the recursive version of Karatsuba, which allows us to parallelize each layer computation.

Closes #9

teor2345 commented 3 months ago

Review notes:

  • Just using the new wrapper may affect performance, since we have better degree management.
  • Examine in detail how poly-split functions are implemented, again rust knowledge is helpful here, because we would like to reinterpret slices of the polynomials as polynomials themselves.
teor2345 commented 3 months ago

I think some of these errors will go away once the conflicts in this PR are resolved. That will pick up some missing implementations.

teor2345 commented 3 months ago

It also looks like we’ll need to merge PR #33, then use some of its methods here.

teor2345 commented 3 months ago

If this PR is too big, you could split the changes to the recursive and flat implementations. But that's totally optional and your decision. Whatever works best for you.

teor2345 commented 3 months ago
  • Examine in detail how poly-split functions are implemented, again rust knowledge is helpful here, because we would like to reinterpret slices of the polynomials as polynomials themselves.

This isn't possible, because the DensePolynomial type requires its inner data to be a Vec (not a slice or an array).

But memory copies are cheap and sometimes skipped by the Rust compiler, if it's clear they aren't needed. We can make it clear by creating small functions with simple code.

teor2345 commented 3 months ago

@emmorais feel free to fix, review and merge this PR, your parts look good to me.

teor2345 commented 3 months ago

Well, that bug-hunting was not fun. There were a bunch of untested bugs in my PR #33 that only showed up in this one.

I want to make smaller PRs going forward, so it's easier to find bugs.

teor2345 commented 3 months ago

It looks like the rewritten recursive implementation is much faster, so I've switched the Poly type over to it.

On my M1 (ARM) Mac, stable Rust:

Benchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N: WBenchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N: Collecting 50 samCyclotomic multiplication: polynomial/2 random polys of degree N
                        time:   [54.442 ms 54.572 ms 54.714 ms]
                        change: [-2.5709% -1.5972% -0.7933%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 50 measurements (20.00%)
  4 (8.00%) high mild
  6 (12.00%) high severe

Benchmarking Karatsuba multiplication: polynomial/2 random polys of degree N: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 8.7s, enable flat sampling, or reduce sample count to 20.
Karatsuba multiplication: polynomial/2 random polys of degree N
                        time:   [6.7853 ms 6.8005 ms 6.8192 ms]
                        change: [-9.5455% -9.2081% -8.8366%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  4 (8.00%) high mild
  1 (2.00%) high severe

Flat Karatsuba multiplication: polynomial/2 random polys of degree N
                        time:   [27.659 ms 27.709 ms 27.770 ms]
                        change: [-3.5473% -2.6540% -2.0606%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  2 (4.00%) high mild
  3 (6.00%) high severe

On my Intel Linux, stable Rust:

Cyclotomic multiplication: polynomial/2 random polys of degree N
                        time:   [54.231 ms 54.341 ms 54.515 ms]
Found 4 outliers among 50 measurements (8.00%)
  4 (8.00%) high severe

Karatsuba multiplication: polynomial/2 random polys of degree N
                        time:   [8.2924 ms 8.3006 ms 8.3102 ms]
Found 11 outliers among 50 measurements (22.00%)
  5 (10.00%) high mild
  6 (12.00%) high severe

Flat Karatsuba multiplication: polynomial/2 random polys of degree N
                        time:   [27.975 ms 28.026 ms 28.098 ms]
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high severe

On GitHub CI, stable Rust (unreliable timings due to resource sharing):

``` Benchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N Benchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N: Warming up for 3.0000 s Benchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N: Collecting 50 samples in estimated 6.5552 s (100 iterations) Benchmarking Cyclotomic multiplication: polynomial/2 random polys of degree N: Analyzing Cyclotomic multiplication: polynomial/2 random polys of degree N time: [65.501 ms 65.697 ms 65.993 ms] Found 4 outliers among 50 measurements (8.00%) 1 (2.00%) low mild 1 (2.00%) high mild 2 (4.00%) high severe Benchmarking Karatsuba multiplication: polynomial/2 random polys of degree N Benchmarking Karatsuba multiplication: polynomial/2 random polys of degree N: Warming up for 3.0000 s Benchmarking Karatsuba multiplication: polynomial/2 random polys of degree N: Collecting 50 samples in estimated 5.2531 s (550 iterations) Benchmarking Karatsuba multiplication: polynomial/2 random polys of degree N: Analyzing Karatsuba multiplication: polynomial/2 random polys of degree N time: [9.5365 ms 9.5386 ms 9.5409 ms] Found 3 outliers among 50 measurements (6.00%) 3 (6.00%) high mild Benchmarking Flat Karatsuba multiplication: polynomial/2 random polys of degree N Benchmarking Flat Karatsuba multiplication: polynomial/2 random polys of degree N: Warming up for 3.0000 s Benchmarking Flat Karatsuba multiplication: polynomial/2 random polys of degree N: Collecting 50 samples in estimated 5.0290 s (150 iterations) Benchmarking Flat Karatsuba multiplication: polynomial/2 random polys of degree N: Analyzing Flat Karatsuba multiplication: polynomial/2 random polys of degree N time: [[33](https://github.com/Inversed-Tech/eyelid/actions/runs/8682777692/job/23807800235?pr=32#step:5:34).510 ms 33.518 ms 33.526 ms] Found 2 outliers among 50 measurements (4.00%) ```

On my Intel Linux, nightly Rust, with extra debugging on:

``` Cyclotomic multiplication: polynomial/2 random polys of degree N time: [82.030 ms 82.168 ms 82.400 ms] Found 3 outliers among 50 measurements (6.00%) 1 (2.00%) high mild 2 (4.00%) high severe Benchmarking Karatsuba multiplication: polynomial/2 random polys of degree N: Warming up for 3.0000 s Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 11.0s, or reduce sample count to 20. Karatsuba multiplication: polynomial/2 random polys of degree N time: [219.55 ms 219.62 ms 219.70 ms] Found 2 outliers among 50 measurements (4.00%) 1 (2.00%) high mild 1 (2.00%) high severe Benchmarking Flat Karatsuba multiplication: polynomial/2 random polys of degree N: Warming up for 3.0000 s Flat Karatsuba multiplication: polynomial/2 random polys of degree N time: [25.571 ms 25.616 ms 25.691 ms] Found 2 outliers among 50 measurements (4.00%) 2 (4.00%) high severe ```