Support inner nodes proving/verification

rach-id commented 6 months ago

As part of https://github.com/celestiaorg/celestia-node/issues/3361, where we will need to support proving and verifying share commitment subtree roots, the NMT needs to support proving and verifying inner nodes.

rach-id commented 6 months ago

@staheri14 proposed the following:

Good news: I figured out a way to create proofs with a similar/identical structure as the current nmt proofs, without any need to incorporate index and height into proof’s nodes (zero metadata). The high level idea is that:
For the proof generation, we follow the exact same range proof generation for the range of indices that the share commitment is calculated for.
On the verification side, instead of passing leaf hashes, we pass the subtree roots, ordered according to the in-order traversal of the tree.
During the verification process, when we  recursively walk the tree top-down from the tree root (checking left then right subtrees), we need to check if we hit a subtree root that covers a range of indices that match the share commitment subtree roots, and then we pop the corresponding subtree root.
The key point here is that each subtree root (of the share commitment) covers a specific index range, and that range can be calculated by knowing the total range of the proof.

Which is apparently the direction we will be following.

This is a draft of how this implementation would look like: https://github.com/celestiaorg/nmt/pull/258/files

Currently, even if the provided coordinates are not consecutive, we will create a consecutive range of them and prove the whole thing. For example, if the coordinates cover the leaves [1, 3] and [5, 10], we will be proving [1, 10]. This is sufficient for share commitment proving/verification, since the subtree roots cover adjacent ranges. However, we might want to create an issue to further extend the library to support arbitrary ranges.

Is this a valid compromise, or we want the full feature to be supported, aka, ranges that are not adjacent?

cc @staheri14 @adlerjohn @evan-forbes

evan-forbes commented 6 months ago

sgtm!

staheri14 commented 6 months ago

The question has been raised as to how we can calculate the range of each subtree root. Below is the explanation for that.

Assumptions:

Tree size is a power of 2
Proof range is [x, y) (y is not inclusive)
The index of the starting share is a power of 2 (according to blob share commitment rule)

Algorithms:

Proof: As in a normal Merkle range proof, the final proof will consist of a range [x, y) and the normal Merkle range proof.
Verification: In the verification, we have subtree roots instead of leaf hashes. The subtree roots are ordered according to the in-order traversal of the tree. Suppose the subtree roots are denoted as [S_1, S_2, ...].

Following the proposal I initially made (available in the GH description), the range of each subtree root can be calculated as follows (below we are relying on the fact the subtree roots are calculated using the Mountain Merkle Range algorithm):

Let d be y - x (the range of the proof).
i is the index of the next subtree root.
While d != 0:
- Let z be the largest power of 2 that fits in d; here we are finding the range for the next subtree root.
- The range for the next subtree root is [x, x + z), i.e., S_i is the subtree root of leaves at indices [x, x + z).
- d = d - z (move past the first subtree root and its range).
- i = i + 1.
- Go back to the loop condition.

The rest would be according to the initial proposal.

rach-id commented 6 months ago

@staheri14 Thanks a lot for your explanation. The algorithm looks good :+1:

The thing is, for the share commitment subtree roots, we assume the tree size to be a power of 2. But in general, the nmt can be any size. Can this algorithm be tweaked to support any tree size?

Otherwise, we can first ship your algorithm in a method like: VerifyPowerOfTwoTreeInnerNodes(...) and document that this method can only be used for trees that have a power of 2 size. Then move on to the InnerProof (created in https://github.com/celestiaorg/nmt/pull/258/files) and discuss if it makes sense to have it, or we can just wait until we implement the full-fledged inner nodes' verification?

staheri14 commented 6 months ago

@rach-id The NMT size being a power of two is an assumption based on the celestia design and specifications (the whole system is based on this afaik): The square size is a power of two, so is the nmt size. Are we thinking about a different use-case?

Update: missed this part of your comment But in general, the nmt can be any size. Can this algorithm be tweaked to support any tree size? well, with a second thought, I think this algorithm is applicable to any nmt size actually. I'll check it further, but cannot see any immediate issue for its usecase in general size nmts.

staheri14 commented 6 months ago

Note that in my algorithm, there is no assumption about the number of shares that the subtree roots represent. Hope that clarifies the confusion.

rach-id commented 6 months ago

Are we thinking about a different use-case?

The NMT implementation follows the RFC 6962 standard. So, it supports any tree size. That's why I asked aboht ir.

For the general use case, it's gonna allow users to verify inner nodes in NMT proofs. I think it would be ideal if we kept the NMT implementation general, and not Celestia specific.

rach-id commented 6 months ago

@staheri14 Concerning the algorithm, I am applying it here:

x = 4, y = 14

d = 10
i = 0
z = 8
I0 = [4, 4 + 8) = [4, 12) which is not the case. I0 is only covering [4, 8)

Am I missing something in the application of this algorithm?

staheri14 commented 6 months ago

@rach-id We can approach this from a different angle with an entirely new and much simpler solution. While the underlying logic is similar, instead of calculating the ranges of the subtree roots, we select subtree roots as we traverse the tree from top to bottom.

As we traverse the tree from top to bottom (as implemented in this function: computeRoot), we keep track of the range of leaves that the computeRoot function is called for (this is already part of the function's signature). For each range, we compare it against [x, y). If a range perfectly fits into [x, y), we pop one of the subtree roots and return from the recursive function (this line specifically). The rest of the logic in that function should remain the same. This works because:

The recursive tree traversal visits the left and then the right subtrees, i.e., inorder traversal.
The subtree roots are also determined using the same tree traversal method.

rach-id commented 6 months ago

We select subtree roots as we traverse the tree from top to bottom.

How would we select them? You mean we should use coordinates? Like in VerifyInnerNodes in https://github.com/celestiaorg/nmt/pull/258/files?

staheri14 commented 6 months ago

No, without coordinates, in your example, the subtree roots you pass to the function are I0, I1, and I2 (instead of leafhashes), with this exact order. So when traversing the tree, the first range that you hit that falls under the proof range is exactly I0, which has the range [4,8), so you pick I0 up from the supplied subtree roots. And you go on like this.

rach-id commented 6 months ago

I see, but how we know that the subtree root's range is [4,8)?

staheri14 commented 6 months ago

The reason can be found in the two statements within my previous comment

The recursive tree traversal visits the left and then the right subtrees, i.e., inorder traversal. The subtree roots are also determined using the same tree traversal method.

staheri14 commented 6 months ago

If I want to put it differently, I'd say the subtree roots are ordered based on the range of leaf indices they cover. And that is why you can pick them up as you traverse the tree inorder (left then right).

rach-id commented 6 months ago

How would you do that? For example, all the following can be subtree roots for the same range of leaves:

The subtree roots are the nodes in blue.

The height of the trees depend on the subtree root threshold used. And NMT shouldn't be aware of that as that is Celestia-app specific.

Am I missing something?

staheri14 commented 6 months ago

Are all the subtree roots at the same height? Asking as in the previous example in https://github.com/celestiaorg/nmt/issues/256#issuecomment-2131198436 it was not the case?

rach-id commented 6 months ago

These are all possible cases for the subtree roots

rach-id commented 6 months ago

As per sync discussions with Sanaz, we have the following two propositions:

Implement the subtree root verification that is specific to Celestia. This means that the algorithm would assume a power of 2 number of leaves, the proved range needs to start at a power of 2 (not always, but depends on the subtree root threshold), and would take as argument a subtree root threshold.
- Pros: Simple, less changes needed, cheaper verification cost on EVM chains, less metadata to send on EVM chains
- Cons: Celestia specific. So, if we decide to go down this route, we would need to discuss whether we want to this code to live in the NMT repo, or put it in a downstream repo.

Now that I am thinking about it, we only need this to be on EVM chains. I am not aware of a place where we would verify the inclusion of subtree roots using a proof without having the actual data.

Implement a generalised algorithm in the NMT repository that can prove any inner nodes combination, but uses coordinates and is more complex/expensive to run.
- Pros: General purpose
- Cons:
  - Would take more time to implement correctly with all the test cases etc
  - We don't exactly need it for Blobstream, since as I am mentioned in (1.b.i), we can have the simple algorithm on chain since it's cheaper, faster, and easier to audit.

Decision: Go with the Celestia specific approach.

For reference, a draft of the general purpose approach can be found in this PR https://github.com/celestiaorg/nmt/pull/258

staheri14 commented 6 months ago

These are all possible cases for the subtree roots

Here is how we can accomodate the subtree root threshold.

As we traverse the tree from top to bottom (as implemented in this function: computeRoot), we keep track of the range of leaves that the computeRoot function is called for (this is already part of the function's signature). For each range, we compare it against [x, y). If a range perfectly fits into [x, y) AND if the range (of leaves that the computeRoot function is called for) is less than or equal to SubtreeRootThreshold, we pop one of the subtree roots and return from the recursive function (this line specifically). The rest of the logic in that function should remain the same. This works because:

The recursive tree traversal visits the left and then the right subtrees, i.e., inorder traversal.
The subtree roots are also determined using the same tree traversal method.

cc: @rach-id

rach-id commented 6 months ago

This is how I did it: https://github.com/celestiaorg/nmt/pull/260, is this what you're referring to or I am missing something?

celestiaorg / nmt

Support inner nodes proving/verification #256

Assumptions:

Algorithms: