Closed andrijamitrovic23 closed 1 year ago
Thanks for identifying!
I'm not sure I fully understand the Shares
field of ErrByzantineData
. It claims to be "Pre-repaired shares" but if the crossword is solved iteratively then I would expect it to be possible for the Shares
inside ErrByzantineData
to contain repaired shares from previous iterations of the solveCrosswordRow
and solveCrosswordCol
so does "Pre-repaired shares" in this context just refer to one iteration?
Might be related https://github.com/celestiaorg/rsmt2d/issues/112
I'm not sure I fully understand the
Shares
field ofErrByzantineData
. It claims to be "Pre-repaired shares" but if the crossword is solved iteratively then I would expect it to be possible for theShares
insideErrByzantineData
to contain repaired shares from previous iterations of thesolveCrosswordRow
andsolveCrosswordCol
so does "Pre-repaired shares" in this context just refer to one iteration?
I've realized that as well and understood that Shares
could contain repaired cells from previous iteration.
does "Pre-repaired shares" in this context just refer to one iteration?
Yes, that's correct. In other words, it contains shares whose individual inclusion is guaranteed to be provable by the full node (i.e. shares usable in a BEFP).
In the description of the issue there is a small mistake, which is not affecting the fix. Nonetheless, let me bring it up: The p.Shares
is not going to be an array of nil
s, but rather an empty array. Thus, when a bad encoding proof is received by other nodes and they call the function Validate
on it, an error will be raised upon checking the size of p.Shares, here.
Isn't the error returned by verifyAgainstColRoots
, not ErrByzantineData
, but an err returned by the Merkle tree implementation (which in the case of an NMT, would be e.g. an unordered share error)? The error is emitted here.
This looks like potentially the same issue as https://github.com/celestiaorg/rsmt2d/issues/191#issuecomment-1613790233:
Implementation wise, the only thing I'm unsure of is if err is be bubbled up in getRowRoot, and how solveCrossword behaves if there's an error, which should be reviewed. Seems like the tree's err is just returned, when the desired behavior should actually be to return ErrByzantineData, so that should be discussed. Either we change the err returned to be ErrByzantineData (which might not be clean, not sure we can assume that is good behaviour for all trees, or if NMT should actually return nil on unordered shares).
FYI: I'm moving forward with the implementation of the following solution in rsmt2d: if the tree construction fails for a row or column, during the repair process (this includes any inner function calls as well), an ErrByzantineData should be populated, and returned.
Isn't the error returned by
verifyAgainstColRoots
, notErrByzantineData
, but an err returned by the Merkle tree implementation (which in the case of an NMT, would be e.g. an unordered share error)? The error is emitted here.
In the scenario where the column root can be successfully computed, but it does not equal the expected root colRoots[c]
it would return ErrByzantineData
, here.
(Unless I am missing some reason for why those lines could not be reached.)
I see, yeah that's also an issue.
ErrByzantineData.Shares
are not set when verification of newly completed orthogonal vectors returns aErrByzantineData
error (first place and second place). On the contrary, these shares are set if a verification of newly completed row or column returnsErrByzantineData
error.When
ErrByznatineData
is raised it is propagated throughsolveCrossword
,Repair
,Reconstruct
,Retrieve
, where a newErrByzantine
is created. If the shares within are not filled,GetProofsForShares
will return an empty sequence ofsharesWithProof
andErrByzantine
is created with that empty sequencesharesWithProof
. This is further propagated through GetEDS, SharesAvailable, sample; until a new BadEncodingProof is created, and which will end up propagated (the byzantine error is captured in SharesAvailable) to other nodes, with the fieldShares
being a sequence ofnil
values. Finally, when such a proof is received by other nodes, they callValidate
on it. Since the values in it arenil
, this loop is a no-op. Consequently, decoding of shares should fail, resulting in an error.We suggest filling in the shares with corresponding column or row data if the verification of that newly completed orthogonal column or row returns an
ErrByznatineData
.