celestiaorg / rsmt2d

Go implementation of two dimensional Reed-Solomon merkle tree data availability scheme.
Apache License 2.0
161 stars 79 forks source link

ErrByzantineData.Shares are not filled #178

Closed andrijamitrovic23 closed 1 year ago

andrijamitrovic23 commented 1 year ago

ErrByzantineData.Shares are not set when verification of newly completed orthogonal vectors returns a ErrByzantineData error (first place and second place). On the contrary, these shares are set if a verification of newly completed row or column returns ErrByzantineData error.

When ErrByznatineData is raised it is propagated through solveCrossword, Repair, Reconstruct, Retrieve, where a new ErrByzantine is created. If the shares within are not filled, GetProofsForShares will return an empty sequence of sharesWithProof and ErrByzantine is created with that empty sequence sharesWithProof. This is further propagated through GetEDS, SharesAvailable, sample; until a new BadEncodingProof is created, and which will end up propagated (the byzantine error is captured in SharesAvailable) to other nodes, with the field Shares being a sequence of nil values. Finally, when such a proof is received by other nodes, they call Validate on it. Since the values in it are nil, this loop is a no-op. Consequently, decoding of shares should fail, resulting in an error.

We suggest filling in the shares with corresponding column or row data if the verification of that newly completed orthogonal column or row returns an ErrByznatineData.

rootulp commented 1 year ago

Thanks for identifying!

I'm not sure I fully understand the Shares field of ErrByzantineData. It claims to be "Pre-repaired shares" but if the crossword is solved iteratively then I would expect it to be possible for the Shares inside ErrByzantineData to contain repaired shares from previous iterations of the solveCrosswordRow and solveCrosswordCol so does "Pre-repaired shares" in this context just refer to one iteration?

Wondertan commented 1 year ago

Might be related https://github.com/celestiaorg/rsmt2d/issues/112

andrijamitrovic23 commented 1 year ago

I'm not sure I fully understand the Shares field of ErrByzantineData. It claims to be "Pre-repaired shares" but if the crossword is solved iteratively then I would expect it to be possible for the Shares inside ErrByzantineData to contain repaired shares from previous iterations of the solveCrosswordRow and solveCrosswordCol so does "Pre-repaired shares" in this context just refer to one iteration?

I've realized that as well and understood that Shares could contain repaired cells from previous iteration.

adlerjohn commented 1 year ago

does "Pre-repaired shares" in this context just refer to one iteration?

Yes, that's correct. In other words, it contains shares whose individual inclusion is guaranteed to be provable by the full node (i.e. shares usable in a BEFP).

ivan-gavran commented 1 year ago

In the description of the issue there is a small mistake, which is not affecting the fix. Nonetheless, let me bring it up: The p.Shares is not going to be an array of nils, but rather an empty array. Thus, when a bad encoding proof is received by other nodes and they call the function Validate on it, an error will be raised upon checking the size of p.Shares, here.

musalbas commented 1 year ago

Isn't the error returned by verifyAgainstColRoots, not ErrByzantineData, but an err returned by the Merkle tree implementation (which in the case of an NMT, would be e.g. an unordered share error)? The error is emitted here.

This looks like potentially the same issue as https://github.com/celestiaorg/rsmt2d/issues/191#issuecomment-1613790233:

Implementation wise, the only thing I'm unsure of is if err is be bubbled up in getRowRoot, and how solveCrossword behaves if there's an error, which should be reviewed. Seems like the tree's err is just returned, when the desired behavior should actually be to return ErrByzantineData, so that should be discussed. Either we change the err returned to be ErrByzantineData (which might not be clean, not sure we can assume that is good behaviour for all trees, or if NMT should actually return nil on unordered shares).

FYI: I'm moving forward with the implementation of the following solution in rsmt2d: if the tree construction fails for a row or column, during the repair process (this includes any inner function calls as well), an ErrByzantineData should be populated, and returned.

ivan-gavran commented 1 year ago

Isn't the error returned by verifyAgainstColRoots, not ErrByzantineData, but an err returned by the Merkle tree implementation (which in the case of an NMT, would be e.g. an unordered share error)? The error is emitted here.

In the scenario where the column root can be successfully computed, but it does not equal the expected root colRoots[c] it would return ErrByzantineData, here.

(Unless I am missing some reason for why those lines could not be reached.)

musalbas commented 1 year ago

I see, yeah that's also an issue.