celestiaorg / nmt

Namespaced Merkle Tree
Apache License 2.0
113 stars 39 forks source link

The updateMinMaxID method disregards the `IgnoreMaxNamespace` semantic #159

Closed staheri14 closed 1 year ago

staheri14 commented 1 year ago

Problem

The utility method updateMinMaxID in nmt.go does not consider the IgnoreMaxNamespace semantic when updating the maxNID field of the NamespacedMerkleTree after each leaf is pushed using Push. This leads to inconsistency between the name ID range stored by NamespacedMerkleTree.minID and NamespacedMerkleTree.maxID and the name ID range extracted from the NMT via its MaxNamespace() method, which looks at the NMT's root name ID range affected by the ignoreMaxNamespace logic. Note that the values NamespacedMerkleTree.minID and NamespacedMerkleTree.maxID are used to compare queried NIDs when creating namespace proofs, as shown in this part of the code. This means that queries for the max namespace ID (i.e., parity share name ID), wont be responded by an empty proof. This is against the original purpose of the IgnoreMaxNamespace flag.

Below is a sample test to demonstrate the inconsistency set out in this issue:

nidSize := 1
data := [][]byte{
    append(namespace.ID{1}, []byte("leaf_0")...),
    append(namespace.ID{2}, []byte("leaf_1")...),
    append(namespace.ID{3}, []byte("leaf_2")...),
    append(namespace.ID{4}, []byte("leaf_3")...),
    append(namespace.ID{math.MaxUint8}, []byte("leaf_4")...),
    append(namespace.ID{math.MaxUint8}, []byte("leaf_5")...),
    append(namespace.ID{math.MaxUint8}, []byte("leaf_6")...),
    append(namespace.ID{math.MaxUint8}, []byte("leaf_7")...),
}

tree := New(sha256.New(), NamespaceIDSize(nidSize), IgnoreMaxNamespace(true))
for _, d := range data {
    if err := tree.Push(d); err != nil {
        panic(fmt.Sprintf("unexpected error: %v", err))
    }
}
rootMin, err := tree.MinNamespace()
assert.NoError(t, err)
rootMax, err := tree.MaxNamespace()
assert.NoError(t, err)
assert.True(t, rootMin.Equal(tree.minNID))
assert.True(t, rootMax.Equal(tree.maxNID))                     // mismatch
assert.Equal(t, tree.treeHasher.precomputedMaxNs, tree.maxNID) // the tree.maxNID equals to the maximum namespace (so the Ignore max namespace flag is not effective)

proof, err := tree.ProveNamespace(namespace.ID{math.MaxUint8})
assert.NoError(t, err)
assert.Equal(t, len(proof.nodes), 0) // proof is not empty, while it should be
assert.Equal(t, proof.start, 0)      // proof range is not empty, it outputs 4 however it should be 0
assert.Equal(t, proof.end, 0)        // proof range is not empty, it outputs 8 however it should be 0

Acceptance Criteria

This issue is to decide on the correct behavior (is this inconsistency intentional or not) and make the necessary changes to resolve this inconsistency (or document the rationale of this inconsistency).

staheri14 commented 1 year ago

@liamsi I would appreciate your thoughts on this 🙏

liamsi commented 1 year ago

Ok, so this is not intentional and should be easy to fix: the private fields tree.minNID and tree.maxNID were only introduced to keep track of the latest pushed min/max namespace and accidentally started to be used in namespaced proofs computation (as you explained above). An easy fix would be to use n.MinNamespace() and n.MaxNamespace() in ProveNamespace respectively. We should also update the comment on the private fields such that it is clear they are not supposed to be used for anything else.

@Wondertan @vgonkivs does this impact Celestia node's logic in any way? tldr are the proofs for parity namespaces expected to be non-empty in node?

staheri14 commented 1 year ago

Ok, so this is not intentional and should be easy to fix: the private fields tree.minNID and tree.maxNID were only introduced to keep track of the latest pushed min/max namespace and accidentally started to be used in namespaced proofs computation (as you explained above). An easy fix would be to use n.MinNamespace() and n.MaxNamespace() in ProveNamespace respectively. We should also update the comment on the private fields such that it is clear they are not supposed to be used for anything else.

Thanks @liamsi for your input and clarification. So, given that this is has not been an intential behaviour of the nmt, I am going to implement the fix.

@Wondertan @vgonkivs does this impact Celestia node's logic in any way? tldr are the proofs for parity namespaces expected to be non-empty in node?

Agree with this, it is important to understand the implications of this fix on celestia-node.

staheri14 commented 1 year ago

@liamsi Another question regarding the IgnoreMaxNamespace flag is whether it is necessary to have a method to return the actual namespace range of the tree when this flag is turned on. Currently, all leaves with the maximum NID are somewhat concealed within the tree, giving the appearance that they do not exist. The reason is that the NMT struct lacks a method to return the actual range of the namespace tree without considering the IgnoreMaxNamespace flag. Should we consider adding such a method? would it be useful?

liamsi commented 1 year ago

Another question regarding the IgnoreMaxNamespace flag is whether it is necessary to have a method to return the actual namespace range of the tree when this flag is turned on.

I am not sure why that would that be needed? In Celestia: for the first half of the tree, the namespace range is reflected in the root. For the second half, the namespace is always the same and also known: it's the parity/max namespace.

liamsi commented 1 year ago

It should be possible to prove that a parity leaf is part of the tree though 🤔

staheri14 commented 1 year ago

Reposting my comment from Slack in here:

I am not sure why that would that be needed? In Celestia: for the first half of the tree, the namespace range is reflected in the root. For the second half, the namespace is always the same and also known: it's the parity/max namespace.

I understand your point about Celestia's data square rows and columns. When all shares are pushed to the NMT, the tree structure has a specific structure known to the application. However, if one wants to determine the NMT's namespace range at an arbitrary point in the code (not only when all the shares are pushed), the tree alone does not provide information about the actual NID range. Therefore, the application must keep track of the data that has been added to the tree to determine the NID range. However, it seems there is no practical use case for this. So, thank you, I got my answer. :+1:

It should be possible to prove that a parity leaf is part of the tree though 🤔

Right, and it is possible using the index of a parity share i.e., the index of the leaf in the NMT.

Wondertan commented 1 year ago

Celestia-node uses IgnoreMaxNamespace, but I looked through the code to understand all possible implications precisely. In short, this bug does not affect celestia-node because our tests would fail with false negatives.

The maxNID is only used by ProveNamespace which is untouched by Celestia-node. Instead, we build inclusion proofs with the lower level NewInclusionProof, by reading out proofs from disk.