filecoin-project / FIPs

The Filecoin Improvement Proposal repository
312 stars 164 forks source link

Reduce congestion via Aggregating ProveCommits via Inner Product Pairing #50

Closed nicola closed 3 years ago

nicola commented 3 years ago

Problem

(same as #49 )

ProveCommits and PreCommits are creating network congestion leading to high base fee, leading to high cost for SubmitWindowPoSt and PublishStorageDeal.

(however, differently from #49)

ProveCommits messages and gas used scale linearly with network growth.

Proposed solution

(similar to #49)

Processing multiple ProveCommits at the same time can drastically reduce the gas used per sector, leading to a much lower gas usage from ProveCommits. We propose to add a new method: "ProveCommitAggregated".

(differently from #49)

The ProveCommitAggregated method allows for gas used in the network to scale sub-linearly with the growth of the network. A miner can submit a single short proof for many ProveCommits and the gas used for verification is sublinear in the ProveCommits aggregated. In other words, miners could be making a single ProveCommitAggregated transaction per day, instead of one per sector or one per batch of sectors.

There are ~6M individual SNARK are published on chain per day, largest miner publishes ~600k of them. With this solution - at least in theory - we would need a single SNARK per miner ~700. If we assume that miners only batch every 1,000 proofs on average, the SNARKs published per day would be ~6,000.

There are two parts that can be amortized:

Context on Groth16 aggregation

The Protocol Labs team in collaboration with external researchers and engineers has improved the performances of the rust implementation of the IPP protocol (see ripp). The inner product pairing protocol for aggregating groth16 proofs has been described in this paper.

In high level, the idea is the following: given X Groth16 proofs, we can generate a single proof that these were correctly aggregated in a single proof.

Our preliminary result show that a prover can aggregate up to ~65,000 SNARKs (6,500 ProveCommits) in a size of ~78kB and with verification time of ~150ms.

Note the this type of aggregation sits on top of the existing SNARKs that we do. In other words, there is no need for a new trusted setup.

Comparison with #49 (Batching ProveCommits)

The size of an aggregated proof grows logarithmically in the numbers of proofs to aggregate, differently with batching, where the proof size scales linearly.

In other words, proposal #49 ProveCommitBatched has a limitation in the numbers of proofs to be aggregated, while ProveCommitAggregatedIPP does not. This opens up the possibility for miners to submit a daily single proof of new storage being added.

Outline

With this mechanism, miners should always prefer to aggregate multiple proofs together since they would be substantially reduce their costs.

Discussion

This issue is intended for discussion. There are a number of details to work out before drafting a FIP.

Aggregation parameters

This is a test that aggregates 65536 SNARKs (~6,500 poreps) with a proof size of 78kB and verification time of 197ms

Proof aggregation finished in 41685ms
65536 proofs, each having 328 public inputs...
Verification aggregated finished in 197ms (Proof Size: 78336 bytes)

Open questions:

Note there is a separate line of investigation to propose ProveCommitAggregated with Halo instead of IPP, however it seems that IPP would be ready to be used faster than Halo.

nicola commented 3 years ago

Note that this should be done in conjunction with #25

nicola commented 3 years ago

Update:

nicola commented 3 years ago

Update:

Reach out if you are interested in helping!

nicola commented 3 years ago

Update:

Questions to answer:

ZenGround0 commented 3 years ago

For the aggregation versioning question I am currently convinced that the correct thing to do is the solution described in the second paragraph: introduce an independent abi.RegisteredAggregationProof type and include this as an argument to ProveCommitAggregated.

ZenGround0 commented 3 years ago

We've used the specs-actors prototype in 1381 to compare gas usage between the current prove commit entrypoint and the proposed aggregate entrypoint. Reposting this summary from slack:

PR 1381 contains two tests which measure the gas cost of prove committing x sectors using the current strategy (x invocations of ProveCommit) and the new prototype ProveCommitAggregate. The highlights from the results:

Biggest risks:

Biggest opportunities:

Aggregate PoRep vs Current Porep - Sheet1.pdf

nikkolasg commented 3 years ago

I have changed the benchmark so it repeats a few times the operations to smoothen out the results. Here is the resulting CSV file. TLDR these results makes it a bit faster than previous results, also thanks to the latest optimization, and it shows a higher difference versus "multiple batching of batches of 10 proofs" which is current Lotus behavior. So we should expect higher reductions in gas from these numbers. Note still that these are ran on a 32c/64t ThreadRipper machine so parallelism is playing big here.

nproofs,aggregate_create_ms,aggregate_verify_ms,batch_verify_ms,batch_all_ms,aggregate_size_bytes,batch_size_bytes
8,82,14,6,5,21328,1536
16,123,15,6,5,27184,3072
32,197,15,9,8,33040,6144
64,317,15,12,12,38896,12288
128,493,18,17,19,44752,24576
256,840,21,27,32,50608,49152
512,1292,21,50,46,56464,98304
1024,2362,23,87,77,62320,196608
2048,4309,28,165,147,68176,393216
4096,8020,28,321,248,74032,786432
8192,15395,31,656,491,79888,1572864
nicola commented 3 years ago

Updated the benchmark notebook: https://observablehq.com/@protocol/provecommitaggregate-notebook, hopefully this should give much better results!

Pegasus-starry commented 3 years ago

Hi , I have met the panic problem about commit aggregate failed, who can help me to solve it? Thanks.

{“level”:“warn”,“ts”:“2021-07-02T12:08:48.101+0800",“logger”:“sectors”,“caller”:“storage-sealing/fsm.go:627",“msg”:“sector 2393 got error event sealing.SectorCommitFailed: aggregate error: aggregating proofs: Rust panic: Once instance has previously been poisoned”}

{"level":"warn","ts":"2021-07-02T12:04:52.703+0800","logger":"sectors","caller":"storage-sealing/fsm.go:627","msg":"sector 2396 got error event sealing.SectorCommitFailed: aggregate error: aggregating proofs: Rust panic: no unwind information"}

kaitlin-beegle commented 3 years ago

Marking as inactive due to the passage of FIP-0013