Open DmytroTym opened 2 years ago
This seems to go in the opposite direction of how Plonk and TurboPlonk achieves a better performance---by using more FFTs, people try to reduce MSMs, and it has been the reason why Plonk/TurboPlonk is commonly faster.
So I have concerns that you want to add more FFTs :)
This one would be relevant: #846
@weikengchen really interesting PR, haven't seen it before. Is it going to get merged? Regardless, I don't think it changes parts of the prover relevant to this PR and the PR still applies IMO. To be clear, I'm not doing any performance trade-offs, just removing unnecessary FFTs or factoring them out into the indexer. All the committed polynomials stay exactly the same as before, the resulting proof is identical and is getting approved by the verifier without any changes to its code
💥 Proposal
I think there are some FFTs in the Marlin prover that can be removed (almost) for free. Namely:
b(X)=alpha*beta-alpha*row(X)-beta*col(X)+rowcol(X)
on a "multiplication" domain to then multiplyb(X)
andf(X)
. Currently, this is done by computing the evaluations ofb
on the non-zero domain, then iFFTing it into the coefficient form and FFTing back to the evaluations on a larger "multiplication" domain. An easy optimization is computing the values ofrow
,col
, androwcol
polynomials on a "multiplication" domain during indexing to begin with. This saves 2 FFTs for each of 3 matrices, so 6 FFTs in total;w
which is the difference of LDEs of private and public variables. Currently, we iFFT public variable polynomialx
into the evaluation form, subtract it from the public polynomial there and FFT the result back to get coefficients of the resultw
. Instead, we could just do a subtraction in coefficient form, which saves us 1 FFT;Overall, 7 out of 27 FFTs are removed. See the PoC here: https://github.com/ingonyama-zk/snarkVM-1
@Pratyush do you agree? If so, I can make a PR
Benchmarking
snark_prove
with a more realistic number of variables and constraints of 32k (I'm not sure how to bench mining circuit now since snarkVM-dpc is gone): Before: [749.96 ms 760.54 ms 771.26 ms] After: [739.15 ms 744.35 ms 749.20 ms] [-3.6737% -2.1287% -0.6064%] (p = 0.02 < 0.05) Much less speedup than I expected, but it should get more significant for larger circuits and/or if MSM is hw-accelerated