Open storojs72 opened 1 year ago
In https://github.com/lurk-lab/solidity-verifier/pull/31, there is a useful link to the gist, dedicated to gas cost reduction:
https://gist.github.com/grGred/9bab8b9bad0cd42fc23d4e31e7347144#for-loops-improvement
Duplicating here to keep it in mind
As this issue was kind of opened last week, Are these two contracts made gas redundant or not?
Although, I noticed that a lot of changes can be made inside the function verifyStep3PrecomputePrimary
of NovaVerifierContractPasta.sol
The primary goal was providing compatibility with reference Rust implementation of the verifier, so for now there are various options for gas cost optimisation indeed. The systematic approach would imply evaluating current libraries (building blocks of the end-to-end verification flow) and identify what parts of the code are most / least expensive in order to prioritise the optimisation efforts. At the same time, any contribution that reduces overall gas cast is welcomed!
Current gas cost profile (for each individual step):
Pasta | Grumpkin | Grumpkin (assembly) | |
---|---|---|---|
Transcripts initialization | 341,639 | 341,703 | - |
Step 1 | 1,016,352 | 1,016,500 | 146 |
Step 2 | 15,028,491 | 52,162,145 | - |
Step 3 | 63,218,699 | 67,437,489 | - |
Step 4 | 108,660 | 102,491 | - |
Step 5 | 103,402,132 | 97,268,900 | - |
Step 6 | 282,456 | 364,568 | - |
Step 7 | 40,351,919 | 40,377,190 | - |
Total | 223,750,348 | 259,070,986 | - |
cast estimate |
223,811,814 | 259,132,452 | - |
Contracts from this branch were used: https://github.com/lurk-lab/solidity-verifier/commits/gas-optimizing
Poseidon hashing
In https://github.com/lurk-lab/solidity-verifier/commit/ba0975489c29ffb6ec78aad4a24f5ce961b02dde there is an assembly implementation of Poseidon hashing. Comparing to previous non-assembly one, which takes 1,839,096 gas, assembly implementation takes 533,652 which is ~3.5x improvement. In current reference implementation of Nova e2e verification, Poseidon is used 3 times (twice in Step 2 and once in Step 3)
Note: one can further reduce gas cost if we avoid extracting limbs from r_U_primary
/ r_U_secondary
(https://github.com/lurk-lab/Nova/blob/solidity-verifier-pp-spartan/src/r1cs.rs#L554). This potentially allows reducing Poseidon arity to U21.
KeccakTranscript
In https://github.com/lurk-lab/solidity-verifier/commit/9e9d7eb8b29e2524d3522519472ef4203fe341ee there is an assembly implementation of KeccakTranscript. Comparing to previous non-assembly implementation, which takes 443,399 gas, assembly implementation takes 19814 (for a 32 bytes input processing), which is ~22x improvement. In current reference implementation of Nova e2e verification, KeccakTranscript is being used more than 150 times while sumcheck verification of Spartan (more precise number can be obtained later, after implementing steps in assembly).
Note: one can further reduce gas cost if we use single keccak256 hashing inside the transcript and reduce the state to 32 bytes. In such case we will need to perform only single Montgomery reduction stage while squeezing.
EqPolynomial (evaluate)
In https://github.com/lurk-lab/solidity-verifier/commit/ca4cfe2079c62245d691d90fe6f5527a3367ac0a there is an assembly implementation of EqPolynomial (evaluate). Comparing to previous non-assembly implementation, which takes 14968 gas, assembly implementation takes 5114 gas (for r
/ rx
size = 17), which is ~3x improvement.
IdentityPolynomial
In https://github.com/lurk-lab/solidity-verifier/commit/024abb61b37cb7a25bef76d8622423f3fbfe0517 there is an assembly implementation of IdentityPolynomial building block. Comparing to previous non-assembly implementation, which takes 17092 gas, assembly implementation takes 2690 gas (for r
with size = 17), which is ~6x improvement
SparsePolynomial
In https://github.com/lurk-lab/solidity-verifier/commit/c7336f30b0b7a37e3f41051c9dea16fee0200ba7 there is an assembly implementation of SparsePolynomial building block. Comparing to previous non-assembly implementation, which takes 27703 gas, assembly implementation takes 8784 gas (for num_vars = 14
, poly_X
size = 3), which is ~3x improvement
PolyEvalInstance
In https://github.com/lurk-lab/solidity-verifier/commit/1b8b38c1f305a949f3f3cbb424eebc54d3e56f78 there is an assembly implementation of PolyEvalInstance building block. Comparing to previous, Grumpkin-based non-assembly (or partially assembly) implementation, which takes 21,871,794 gas, assembly implementation takes 12,208,521 gas (for comm_vec
/ eval_vec
with size = 9). Regarding comparison of Bn256-based implementations: non-assembly one, takes 121,422 gas, while pure assembly one takes 77,181. Scalar multiplication and points addition precompiles for BN256 do their job!
Sumcheck verification
In https://github.com/lurk-lab/solidity-verifier/commit/eebac3175d3178c71f5ba4f8dc1dea546dd91f22 there is an assembly implementation of Sumcheck building block. Comparing to previous, non-assembly implementation, which takes 7,346,586 gas, assembly implementation takes 320,499 (for sumcheck proof that holds 17 polynomials, with 2 coefficients each), which is ~22x improvement (essentially due to utilising assembly KeccakTranscript).
Step 1
In https://github.com/lurk-lab/solidity-verifier/commit/fd1989dd263a8f647e940d4f1652349cf87eb69c there is an assembly implementation of Step1 library as a one-line assembly function. Comparing to previous non-assembly implementation, which takes 57448 gas, assembly implementation takes only 262 gas.
Step 2
In https://github.com/lurk-lab/solidity-verifier/commit/fd1989dd263a8f647e940d4f1652349cf87eb69c there is an assembly implementation of Step2 as well. Comparing to previous non-assembly implementation, which takes 34,193,872 gas, assembly implementation takes 11,656,035 gas. Important note is that for a given specific proving example (that uses TrivialTestCircuit
and CubicCircuit
) where Poseidon hashing is invoked only 2 times (1 for primary check and 1 for secondary check), This number can be further reduced to actually pure hashing cost (with small overhead required for composing input), which is ~534,106. Rest of the gas currently is spent on: 1) constants preprocessing; 2) computing pValue
(service value required for domain separation, which occupies 0th place at Poseidon's input); 3) decompressing points.
hey @storojs72, great work here! is this still being worked on? would love to be able to use this verifier on ethereum.
Hi, @wd021! This is not a priority for ACC at the moment, however I believe this work can be continued when we find customer interesting in landing Nova proofs on chain
This is the meta-issue, dedicated specifically to reducing the gas cost of our e2e verification.
So far we have two contracts based on a different curve cycle:
Below is basic estimations of gas cost for each contract (a0ed4b74c1f89248e53789cfee26b317b020e11f) using
cast estimate
utility:Important note: these estimations DO NOT include final step of verification (IPA / Zeromorph).
While our internal discussions, @johnchandlerburnham mentioned that the desired gas cost of the verifier should not exceed 500k of gas, so we need some strategy for gas cost reduction.
Here is a short-term roadmap for this development direction:
1) The first obvious step is estimating the gas cost using some alternative tools. Having 2-3 estimations produced by different independent tools should give us some average point, that we can enforce it by CI and prevent committing any code that increases this point.
2) We need to have more fine-grained estimations of every sequential step of e2e verification and gas cost for every building block execution. Ideally we should come up to understanding of the cost of every primitive operation (addition, multiplication, points addition, negation, etc.), then the cost of every building block can be expressed as a number of primitive operations multiplied on the cost of one. And, finally, cost of every sequential step of e2e verification can be expressed in terms of building blocks involved. For now we have 7 sequential steps in our flow, so tentatively this can visualised as following:
Every contract consists from set of steps, where each step has its own cost:
Every single step consist from set of operations which include either building blocks or more primitive operations. So we should 1 following table for each step:
And finally every building block always consists from primitive operations. So we should have 1 following table for each building block:
Once we build mentioned tables, we will have clear picture - what is most / less expensive step / building block / primitive operation and we can effectively split and prioritise the optimisation of this or that part of the codebase.
3) Apply following possible engineering approaches to gas cost optimisations (this doesn't include algorithmic optimisations, where we change the logic of the verification), focusing on most expensive steps / building blocks: