Witness multiplication optimization

Description

Currently, the witness polynomial multiplication in the second round of the AHP requires performing an FFT with a domain of size 4 |H|. This change distributes the multiplication so that it only requires an FFT with a domain of size 2 |H| along with some cheap multiplications/additions.

Running marlin-benches on a fairly powerful machine from master:

per-constraint proving time for Bls12_381: 51662 ns/constraint
per-constraint proving time for MNT4_298: 45649 ns/constraint
per-constraint proving time for MNT6_298: 55951 ns/constraint
per-constraint proving time for MNT4_753: 321626 ns/constraint

Running marlin-benches on the same machine from this PR:

per-constraint proving time for Bls12_381: 50834 ns/constraint
per-constraint proving time for MNT4_298: 46081 ns/constraint
per-constraint proving time for MNT6_298: 54172 ns/constraint
per-constraint proving time for MNT4_753: 320387 ns/constraint

So it seems to give an overall slight latency improvement and reduces the memory overhead of proving. I would guess that performance gains would be more noticeable on weaker machines.

This PR depends on https://github.com/arkworks-rs/algebra/pull/258

Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why.

[x] Targeted PR against correct branch (master)
[x] Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
[ ] Wrote unit tests
[x] Updated relevant documentation in the code
[ ] Added a relevant changelog entry to the Pending section in CHANGELOG.md
[ ] Re-reviewed Files changed in the Github PR explorer

arkworks-rs / marlin

Witness multiplication optimization #70

Description