Streamline Proofs - Githubissues

gligneul commented 1 year ago

📚 Context

The output proofs in the Cartesi rollups are too complex to understand, so I'm proposing a significant simplification.

A single tree to rule them all

The Cartesi rollups should have a single Merkle tree that contains all the DApp outputs. The leaves of this tree will have 32 bytes and will be the output hash. This tree starts empty when the DApp starts. As the DApp advances, the rollups will add the outputs to the tree, one after another. There will be no separation per input or epoch. Instead, the tree will hold all outputs since genesis.

Inside the Cartesi machine, the Rollups C library will maintain an optimized version of this tree. This optimized version can compute the tree root hash as new outputs are added. It will only require log n space and time complexity over the size of the tree to compute the root hash. The Cartesi machine should provide a way to obtain the tree's root hash.

The rollups reader node will maintain a full version of the tree. The reader will update this tree alongside the one in the Cartesi Machine, which should yield the same root hash. Every time a new output is added to the tree, the root hash changes, so keeping the full tree to compute the proof of a given output is necessary. We can optimize this tree by assuming that all empty leaves are zero. However, storing the tree will take O(n) space if it is full.

No more epochs

The rollups claim should be the hash of the Cartesi Machine plus the hash of the outputs Merkle tree. The claim can be done at any input (I think?). Once a claim is accepted after the dispute period, the previous one can be discarded because validating any previous output with the latest claim is possible. Remember, the output Merkle tree contains all the outputs. The reader node should be aware of the current claim in the blockchain to provide the correct proof for the outputs. So, we need to identify the claim by the input index.

gligneul commented 1 year ago

@diegonehab @GCdePaula @guidanoli

GCdePaula commented 1 year ago

Nice writeup! I think I just disagree with the last part. In any case, I'll list all changes, let me know if I got anything wrong.

I believe there are four main changes compared with the current implementation after unifying the outputs:

The outputs tree accumulates forever, no longer being zeroed after each epoch. This means closing epochs has no side-effects, and that that claims are path-independent. This is a great property to have in general, making it easier to reason about the rollups. We can claim whenever we want. Furthermore, it simplifies parts of the node, server-manager, onchain (can keep just the latest claims).
As a consequence, we can move the outputs tree inside the machine. Rather than building this tree outside the emulator, it's now the responsibility of application code to build this tree. This is an amazing simplification for arbitration. It's also easier to reason about the rollups. We also gain the freedom to more easily change the tree structure (dynamic, Patricia, sideways...). This change means the application has to:
- Store an auxiliary array of merkle "uncles", which grows logarithmically on the number of outputs.
- After each input, right before yield, compute the merkle root hash of all outputs, "completing" the tree with zeroes.
- Store this merkle root hash at a specific known address in the machine.
As a consequence, the claim becomes just the machine state hash. There's no longer an external tree, just machine state.
Simplify the outputs tree to no longer be this nested, two-tiered structure. Instead it's a normal tree where the leaves are sequential outputs. Also, the leafs is just the hash of the output, and not the merkle hash of the output hash split into words.

@gligneul I think (3) is the only disagreement. Did I get the rest correctly?

@augustoteixeira I think you may find this interesting.

gligneul commented 1 year ago

@GCdePaula it is possible for the claim to be just the machine hash. However, if we do that, when proving an output, we also need the proof that the output tree root is inside the machine. I don't think this is a problem.

GCdePaula commented 1 year ago

@gligneul Yeah, it would need an extra proof.

Like, I guess with the current epoch also kinda needs a proof, which is just two hashes (output trees hash and machine state hash).

With this change, you'd need a much larger merkle proof. This would suffer of the "32-bytes hash split in four words and merkleized" thing. Suppose the location of the output hash is well aligned. The function would receive the outputs tree hash (TH) and a merkle proof of a memory range of four words. We first merkleize TH in the granularity of words (MTH), then use the merkle proof to prove that MTH is inside the machine.

As an optimization, since the proof and outputs tree is constant, when a claim is made, we could pass the proof along with the claim and have the smart contract validate the proof and save the extracted outputs tree.

augustoteixeira commented 1 year ago

Great, @gligneul!

I think that this will indeed simplify a lot all the pieces of the architecture. From the nodes, to the disputes.

felipeargento commented 1 year ago

Great job! Super interesting proposal. This would significantly enhance the current system.

However, I'd like to offer a different perspective on the part where you mentioned, "Once a claim is accepted after the dispute period, the previous one can be discarded because validating any previous output with the latest claim is possible."

My main concern is that this approach would require users to continuously update their proofs for every output they wish to execute. For instance, if a user intends to execute an old output for which they already have a proof, they would need to resync their entire dapp to the most updated state. This could be cumbersome and might deter seamless interactions. Moreover, even if your node is already up to date, there's a potential risk of getting frontrunned by a fresh claim, leading to a transaction getting reverted.

A potential remedy I'd suggest is to retain all claims rather than overwriting them. In this way, proofs could target any valid historical claim, offering more flexibility and less dependency on the latest claims. The executeOutput function signature could be something like: executeOutput(claim, proof, output), where the system would merely verify if the claim exists within the storedall_claims.

This might seem like a slight optimization on the surface, but the added convenience of executing old outputs without necessitating a full dapp sync could be invaluable for end-users.

Again, great work on the proposal! I'm eager to see how this evolves.

guidanoli commented 1 year ago

I agree with @felipeargento that we should allow users to prove the validity of outputs with any previously submitted claim, for usability reasons.

augustoteixeira commented 1 year ago

By the way, for anyone who might be interested in the details of the discussion and wants to see more material.

The idea detailed by @gligneul has inspiration in one from @GCdePaula, described in this Research Meeting video:

https://drive.google.com/file/d/1jG-o8I8vwDw3ka6wYrISj4SuQIpX6Sbj/view?usp=sharing

More precisely at 26:40

cartesi / rollups-contracts

Streamline Proofs #86

📚 Context

A single tree to rule them all

No more epochs