Closed liamsi closed 4 years ago
Things to add:
The annotated Merkle tree abstraction isn't suitable for prefixing to differentiate between internal and leaf nodes. Need to make that an explicit feature.
Independent of the prefixing, I'm not sure the annotated and the namespaced merkle tree hashes the correct thing.
Looking at how @musalbas defined the hashing in his paper:
We define a wrapper function nsHash(x) that produces hashes prefixed with namespace identifiers. A namespaced hash has the format
minNs||maxNs||hash(x)
, whereminNs
is the lowest namespace identifier found in all the children of the node that the hash represents, andmaxNs
is the highest.
OK, and:
If x is a leaf, then minNs = maxNs = ns(x), as the hash contains only one leaf with a single namespace
Using the notation in the spec, my understanding is that this would result in:
n_min = m_1_l(m) = m.namespace_id
n_max = m_2_l(m) = m.namespace_id
v = n_min || n_max || h(serialize(m))
but we have:
n_min = m_1_l(m) = m.namespace_id
n_max = m_2_l(m) = m.namespace_id
v = h(serialize(m))
Similarly, for the inner nodes, I think the hash of should be something like:
v = nsh(l, r) = n_min || n_max || h( l.v, r.v )
instead of currently:
v = h(l, r) = h(l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v)
I know the spec's version is just a logical abstraction but at least it isn't obvious how both are compatible.
Ah, while writing this up, I think I see how they can be seen as equivalent.
The two ways of computing node values aren't identical but should provide the same guarantees maybe? I need to double-check, but if not the annotated Merkle tree abstraction can be changed to reflect the way NMTs are described in the paper.
I think it's actually closer to the paper as it seem on 1st sight, with the difference that leaf nodes are "namespace-hashed" as n_min || n_max || h(m)
in the paper and as h(serialize(m))
in the spec.
Something else I've noted: the spec makes the assumption that each leaf message m, somehow contains the namespace id via: m.namespace_id
. Later, we speak of shares that are committed to the NMT, which do not explicitly have such a field: https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#share
My understanding is that a share implicitly belongs to a namespace but reading through it again, the connection between share and namespace doesn't seem clear enough to me.
I would suggest to remove the "Annotated Merkle Tree" section as it is not really needed as part of the LL specification and directly introduce the concrete and simpler special case we need, namely the NMT. Unless we foresee any use for the generalisation (the annotated tree), which I currently don't see.
The min and max values of a node should be outside of the hash of the children, so that a client that wants to verify that they have received all the messages for their namespace should not need to download more data than they need to (specifically, the values of nodes that they aren't interested in).
https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#sparse-merkle-tree
Better to use a different word than message here (otherwise this might be conflated with the other use of the word messages)
Also, it would help getting a quick understanding, if there was 2-3 simple to follow examples how an smt and its will look like (including trivial edge-case like all-empty tree, only one entry, and one realistic example with a few entries).