staheri14 commented 1 year ago

Problem

The issue at hand pertains to the generation of NMTs from sets of leaves that are not ordered lexicographically based on their namespaces. The behavior of NMT proofs generated from such malformed trees is unclear and needs to be addressed. The main objectives of this issue are to:

Determine whether the current NMT proof verification system is able to detect out-of-order leaves and respond accordingly.
If the current system is not capable, then develop and implement measures to invalidate NMT proofs generated from malformed NMTs.

evan-forbes commented 1 year ago

just leaving a thought here: I think we'll have the functionality to verify an out of order nmt inclusion proof, but also have some way of notifying the user that one or more nodes was out of lexicographical order. This way, we could prove that an invalid order was committed to.

staheri14 commented 1 year ago

Per conversation with @evan-forbes, I'd like to provide more detail on the attack scenario that prompted the opening of this issue:

Malicious consensus nodes could produce and commit a block, denoted as B, in which the shares are not ordered lexicographically based on their namespace IDs. A DA (data availability) full node obtains the block header, including the data root denoted as dataRoot, and the underlying shares, and will attempt to construct the block. However, if the node detects that the shares used to construct dataRoot are unordered, it must provide a proof of bad encoding to the light clients. Specifically, it must indicate that there are out-of-order leaves in the tree represented by dataRoot.

The problem is that the current NMT (namespace Merkle tree) implementation does not support this kind of bad-encoding proof. Furthermore, the library does not even allow for the construction of a tree with unordered leaves, since it is protected through the Push method, which checks the namespace ID of the inserted data items. Therefore, the issue at hand is to implement new logic in the NMT library that would allow a full node to provide such bad-encoding proof to the light clients.

staheri14 commented 1 year ago

It might be already covered by the existing bad encoding fraud proof https://github.com/celestiaorg/celestia-node/blob/main/share/eds/byzantine/bad_encoding.go, we need to check.

Update: The current bad encoding fraud proof logic does not capture this issue, the NMT construction errors out on a set of unordered leaves, disallowing the full node from calculating the row and column root. This means the full node cannot generate a bad encoding proof if shares are not ordered in a row or column.

staheri14 commented 1 year ago

During my call with @evan-forbes, we discussed the potential benefits of enabling light nodes to identify out-of-order leaves when sampling and verifying their inclusion proof, in addition to the full nodes. The adversarial setting is similar to the one described in this comment, but with the added possibility of the full node being malicious and not providing fraud proofs for unordered leaves.

During sampling, a light client may encounter proof of inclusion of shares that are out-of-order (for which the proof verification results in false). While light clients reject such a proof, we would like to enable them to craft a bad encoding proof and propagate it in the network. This may require some changes in the verification methods currently available in NMT, such as the ability to calculate the root even if a violation is detected in the proof.

celestiaorg / nmt

Defining and Clarifying the Verification Process for NMT Proofs Generated from Non-Lexicographically Ordered Trees #97

Problem