celestiaorg / celestia-node

Celestia Data Availability Nodes
Apache License 2.0
918 stars 915 forks source link

Update BEFP creation and validation to accomodate more types of validator fraud #2377

Open evan-forbes opened 1 year ago

evan-forbes commented 1 year ago

Per this comment in nmt defining the attack scenario:

Malicious consensus nodes could produce and commit a block, denoted as B, in which the shares are not ordered lexicographically based on their namespace IDs. A DA (data availability) full node obtains the block header, including the data root denoted as dataRoot, and the underlying shares, and will attempt to construct the block. However, if the node detects that the shares used to construct dataRoot are unordered, it must provide a proof of bad encoding to the light clients. Specifically, it must indicate that there are out-of-order leaves in the tree represented by dataRoot.

per the LL white paper section 5.2

An adversarial consensus node may attempt to produce a block that contains a Merkle tree with children that are not ordered correctly. To prevent this, we can set a condition in nsHash such that there is no valid hash when leftMaxNs ≥ rightMinNs, and thus there would be no valid Merkle root for incorrectly ordered children. Therefore blockValid(hi) would return false in the simplistic and probabilistic validity rules as there is no possible Mi where root(Mi) = mRooti

addionally, 2/3s of the voting power could commit to shares that not sized properly and would result in an inability to decode the row or col.

We need to have a fraud proof that can handle out of order namespaces and arbitary things that would cause the erasure decoding to fail. BEFPs (perhaps need to be renamed after this), can and should handle these types of fraud.

During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof to half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.

For the second new type of fraud, if there is something wrong with the shares that would cause the row or col to not be decodable, then if the shares are committed to (step 1 below), then that means that the BEFP is valid.

To accomodate this, the verification logic must be altered to encode the following logic:

  1. Check inclusion: if inclusion fails, proof is invalid
  2. Reconstruction of row of col: if fails, proof is valid
  3. Create merkel root: if creation of root fails, proof is valid
  4. Compare existing root: if different, the proof is valid

currently, we are not following steps two and three of the validation process: https://github.com/celestiaorg/celestia-node/blob/6654cdf4994dbd381efd0d6a29688c731177c855/share/eds/byzantine/bad_encoding.go#L162-L184

cc @renaynay @Wondertan please feel free to add issues, edit, or convert this to an EPIC as needed :slightly_smiling_face:

vgonkivs commented 1 year ago

🤔 IIRC, nmt tree panics in the previous versions if shares were not lexicographically ordered.

Wondertan commented 1 year ago

@evan-forbes, thank you for the detailed issue! Do I understand correctly that the actionable here is to ensure that steps 2 and 3 are followed(and tested) in the implementation?

evan-forbes commented 1 year ago

Do I understand correctly that the actionable here is to ensure that steps 2 and 3 are followed(and tested) in the implementation?

yeah in celestia-node, I believe so yeah. I'm still unsure exactly on what (if anything) needs to be changed on the reconstruction side.

Wondertan commented 1 year ago

Likely relevant https://github.com/celestiaorg/rsmt2d/issues/178

vgonkivs commented 1 year ago

During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof for half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.

The problem here is that if two shares of the row will be out of order, then we can't compute the Merkle root of this row, because of this. This means that computeSharesRoot will also return an error that is different from ErrByzantineData. To handle this case, we need:

I guess for this case we will need a different type of fraud proof because validation will defer from BEFP validation.

@adlerjohn could you please confirm that my understanding is correct? Also, I think we will need additional implementation from the rsmt2d side.

vgonkivs commented 1 year ago

In case we can't build Merkle roots for both row and col, then the entire data is not available

evan-forbes commented 1 year ago

I guess for this case we will need a different type of fraud proof because validation will defer from BEFP validation

is there any way that the BEFP struct that light clients get can be the same? or perhaps we can modify the existing one to handle both cases? In theory, a BEFP just needs any inclusion proof (to row or col roots) for half the shares in a row, correct?

vgonkivs commented 1 year ago

Yes. But with the current implementation, we are verifying inclusion against a single row/col root. New validation requires multiple roots(each share will have its own root to verify from).

vgonkivs commented 1 year ago

What we can do is to have a slice of BadEncodingProof structs with a single share and proof inside, iterate through this slice, and verify inclusion.

adlerjohn commented 1 year ago

The problem here is that if two shares of the row will be out of order, then we can't compute the Merkle root of this row, because of this. This means that computeSharesRoot will also return an error that is different from ErrByzantineData. To handle this case, we need:

Create an inclusion proof for half of the shares against the col;
Verify the inclusion against Merkle root of the columns;
Try to reconstruct and build a tree using shares that were provided - try to build a row.

Yes, that's correct.

I guess for this case we will need a different type of fraud proof because validation will defer from BEFP validation.

We don't need a different data structure, we just need additional codepaths in the verification logic to handle the two additional cases described in OP. The data structure of the proof can remain the same.

musalbas commented 1 year ago

Regarding the above proposed way to handle that case, what if both the col and row has shares out of order?

Isn't the solution simply for the BEFP verifier to try and recompute the row/col root, which will error because one of the shares is out of order, and then fraud proof passes?

evan-forbes commented 1 year ago

if both the col and row has shares out of order

Isn't the solution simply for the BEFP verifier to try and recompute the row/col root

I believe that since both the row and col are out of order, then when the verifier attempts to first prove inclusion to those shares, it will fail since both are out of order.

vgonkivs commented 1 year ago

I believe that since both the row and col are out of order, then when the verifier attempts to first prove inclusion to those shares, it will fail since both are out of order.

But it will mean that the BEFP is invalid.

evan-forbes commented 1 year ago

But it will mean that the BEFP is invalid.

yeah currently that is the case and we will fix to close this PR, correct?

I believe that since both the row and col are out of order, then when the verifier attempts to first prove inclusion to those shares, it will fail since both are out of order.

to clarify this with the above statement in the OP

During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof to half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.

afaiu, we don't have to do anything for the case where both the row and col are out of order, since either the square is available and there exists row/col where the shares can be proven using the opposite col/row roots, or the entire square is not available (sample verification cannot occur because inclusion proofs fail due to nodes in the tree being out of order)

musalbas commented 1 year ago

During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof to half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.

Let's think about the end-to-end flow of sampling and reconstruction. Note that shares being out of order is not the only case where you cannot recompute a row or column root; this would also be the case if any reconstructed shares do not match the row or column root. We can conceptually treat an out-of-order share as the same as a recomputed share that does not match the row or column root.

Firstly, it should not be possible to sample a share that has any sibling nodes that are out of order, because the proof NMT verification will fail when it tries to reconstruct the root (due to a mismatch when calculating min and max nid).

Next, we should consider this rule:

Basically, I believe the rule for choosing what proof should be used for a BEFP share should be:

  1. If a share is inside a fully and validly reconstructed row or column, then you can generate an inclusion proof from that row or column.

  2. Otherwise, you need to use Merkle inclusion proofs that the full node received from other nodes via sampling. Example: consider a square where all erasure coding quadrants (quadrants 2, 3, 4) contain garbage data that does not match the actual row or column roots.

From the above, it doesn't seem to me like we need to create inclusion proofs ourselves in rows/cols with shares out of order, due to (2). Does that seem right @adlerjohn @evan-forbes?

adlerjohn commented 1 year ago

it should not be possible to sample a share that has any sibling nodes that are out of order, because the proof NMT verification will fail when it tries to reconstruct the root

I'm thinking it might be possible if an attacker modifies the NMT code to simply process out-of-order leaves, and the Merkle proof is for a leaf whose neighbor isn't out of order. But this is fine because the case will be caught when the full row/col is reconstructed anyways.

it doesn't seem to me like we need to create inclusion proofs ourselves in rows/cols with shares out of order, due to (2).

That's correct, yes.

musalbas commented 1 year ago

I'm thinking it might be possible if an attacker modifies the NMT code to simply process out-of-order leaves, and the Merkle proof is for a leaf whose neighbor isn't out of order. But this is fine because the case will be caught when the full row/col is reconstructed anyways.

Exactly.