Open evan-forbes opened 1 year ago
🤔 IIRC, nmt tree panics in the previous versions if shares were not lexicographically ordered.
@evan-forbes, thank you for the detailed issue! Do I understand correctly that the actionable here is to ensure that steps 2 and 3 are followed(and tested) in the implementation?
Do I understand correctly that the actionable here is to ensure that steps 2 and 3 are followed(and tested) in the implementation?
yeah in celestia-node, I believe so yeah. I'm still unsure exactly on what (if anything) needs to be changed on the reconstruction side.
Likely relevant https://github.com/celestiaorg/rsmt2d/issues/178
During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof for half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.
The problem here is that if two shares of the row will be out of order, then we can't compute the Merkle root of this row, because of this. This means that computeSharesRoot will also return an error that is different from ErrByzantineData. To handle this case, we need:
I guess for this case we will need a different type of fraud proof because validation will defer from BEFP validation.
@adlerjohn could you please confirm that my understanding is correct? Also, I think we will need additional implementation from the rsmt2d side.
In case we can't build Merkle roots for both row and col, then the entire data is not available
I guess for this case we will need a different type of fraud proof because validation will defer from BEFP validation
is there any way that the BEFP struct that light clients get can be the same? or perhaps we can modify the existing one to handle both cases? In theory, a BEFP just needs any inclusion proof (to row or col roots) for half the shares in a row, correct?
Yes. But with the current implementation, we are verifying inclusion against a single row/col root. New validation requires multiple roots(each share will have its own root to verify from).
What we can do is to have a slice of BadEncodingProof
structs with a single share and proof inside, iterate through this slice, and verify inclusion.
The problem here is that if two shares of the row will be out of order, then we can't compute the Merkle root of this row, because of this. This means that computeSharesRoot will also return an error that is different from ErrByzantineData. To handle this case, we need:
Create an inclusion proof for half of the shares against the col; Verify the inclusion against Merkle root of the columns; Try to reconstruct and build a tree using shares that were provided - try to build a row.
Yes, that's correct.
I guess for this case we will need a different type of fraud proof because validation will defer from BEFP validation.
We don't need a different data structure, we just need additional codepaths in the verification logic to handle the two additional cases described in OP. The data structure of the proof can remain the same.
Regarding the above proposed way to handle that case, what if both the col and row has shares out of order?
Isn't the solution simply for the BEFP verifier to try and recompute the row/col root, which will error because one of the shares is out of order, and then fraud proof passes?
if both the col and row has shares out of order
Isn't the solution simply for the BEFP verifier to try and recompute the row/col root
I believe that since both the row and col are out of order, then when the verifier attempts to first prove inclusion to those shares, it will fail since both are out of order.
I believe that since both the row and col are out of order, then when the verifier attempts to first prove inclusion to those shares, it will fail since both are out of order.
But it will mean that the BEFP is invalid.
But it will mean that the BEFP is invalid.
yeah currently that is the case and we will fix to close this PR, correct?
I believe that since both the row and col are out of order, then when the verifier attempts to first prove inclusion to those shares, it will fail since both are out of order.
to clarify this with the above statement in the OP
During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof to half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.
afaiu, we don't have to do anything for the case where both the row and col are out of order, since either the square is available and there exists row/col where the shares can be proven using the opposite col/row roots, or the entire square is not available (sample verification cannot occur because inclusion proofs fail due to nodes in the tree being out of order)
During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof to half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.
Let's think about the end-to-end flow of sampling and reconstruction. Note that shares being out of order is not the only case where you cannot recompute a row or column root; this would also be the case if any reconstructed shares do not match the row or column root. We can conceptually treat an out-of-order share as the same as a recomputed share that does not match the row or column root.
Firstly, it should not be possible to sample a share that has any sibling nodes that are out of order, because the proof NMT verification will fail when it tries to reconstruct the root (due to a mismatch when calculating min and max nid).
Next, we should consider this rule:
Basically, I believe the rule for choosing what proof should be used for a BEFP share should be:
If a share is inside a fully and validly reconstructed row or column, then you can generate an inclusion proof from that row or column.
Otherwise, you need to use Merkle inclusion proofs that the full node received from other nodes via sampling. Example: consider a square where all erasure coding quadrants (quadrants 2, 3, 4) contain garbage data that does not match the actual row or column roots.
From the above, it doesn't seem to me like we need to create inclusion proofs ourselves in rows/cols with shares out of order, due to (2). Does that seem right @adlerjohn @evan-forbes?
it should not be possible to sample a share that has any sibling nodes that are out of order, because the proof NMT verification will fail when it tries to reconstruct the root
I'm thinking it might be possible if an attacker modifies the NMT code to simply process out-of-order leaves, and the Merkle proof is for a leaf whose neighbor isn't out of order. But this is fine because the case will be caught when the full row/col is reconstructed anyways.
it doesn't seem to me like we need to create inclusion proofs ourselves in rows/cols with shares out of order, due to (2).
That's correct, yes.
I'm thinking it might be possible if an attacker modifies the NMT code to simply process out-of-order leaves, and the Merkle proof is for a leaf whose neighbor isn't out of order. But this is fine because the case will be caught when the full row/col is reconstructed anyways.
Exactly.
Per this comment in nmt defining the attack scenario:
per the LL white paper section 5.2
addionally, 2/3s of the voting power could commit to shares that not sized properly and would result in an inability to decode the row or col.
We need to have a fraud proof that can handle out of order namespaces and arbitary things that would cause the erasure decoding to fail. BEFPs (perhaps need to be renamed after this), can and should handle these types of fraud.
During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof to half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.
For the second new type of fraud, if there is something wrong with the shares that would cause the row or col to not be decodable, then if the shares are committed to (step 1 below), then that means that the BEFP is valid.
To accomodate this, the verification logic must be altered to encode the following logic:
currently, we are not following steps two and three of the validation process: https://github.com/celestiaorg/celestia-node/blob/6654cdf4994dbd381efd0d6a29688c731177c855/share/eds/byzantine/bad_encoding.go#L162-L184
cc @renaynay @Wondertan please feel free to add issues, edit, or convert this to an EPIC as needed :slightly_smiling_face: