The root calculated by an empty Sparse Merkle Tree does not match the expected output defined by the specification

bvrooman commented 2 years ago

The Celestia specification of the Sparse Merkle Tree defines the root of an empty tree with the following:

The base case (an empty tree) is defined as the hash of the empty string:

node.v = 0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

However, the implementation, as currently written, yields a different result. Calling for the root of an empty SMT gives the 32 byte 0 array, i.e.:

smt.Root() = 0x0000000000000000000000000000000000000000000000000000000000000000

This is because the default root of an SMT is a placeholder, where placeholders are simply the 32 byte 0 array. The root is not the hash of the of the placeholder.

I verified this behaviour with the following test code:

package smt

import (
    "crypto/sha256"
    "encoding/binary"
    "testing"
)

func TestEmptyRoot(t *testing.T) {
    smt := NewSparseMerkleTree(NewSimpleMap(), NewSimpleMap(), sha256.New())
    t.Logf("ROOT: %x", smt.Root())
}

This outputs the following:

=== RUN   TestEmptyRoot
    brandon_test.go:11: ROOT: 0000000000000000000000000000000000000000000000000000000000000000
--- PASS: TestEmptyRoot (0.00s)
PASS

Process finished with the exit code 0

It's not immediately clear which value is, in fact, the "correct" value, whether that be the value defined in the spec, or the value outputed by the test.

adlerjohn commented 2 years ago

Okay, so I think I figured it out:

If a tree is empty (i.e. has zero leaves), then its root should be the hash of nothing, i.e. 0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855. The subtlety is that an SMT with only default leaves isn't empty! It still contains 2^256 leaves, they simply have the default value. In that case, the rule of percolating up the default value should be used, which means the root of an SMT with only default leaves should be 0x0000000000000000000000000000000000000000000000000000000000000000.

Both of these cases should be clarified in the specs, but aren't mutually exclusive.

bvrooman commented 2 years ago

To summarize your comment:

An empty tree (a tree with zero leaves) has the hash of nothing 0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 as its root
A tree with all default value leaves has the default value 0x0000000000000000000000000000000000000000000000000000000000000000 as its root

This makes sense, and I agree on both fronts.

However, I would contend that having an empty tree is not possible in actuality.

A sparse Merkle tree operates on a set of key-value elements. In spirit, it is sparse because it contains non-default values for some keys and the default value for the remaining keys in the set. The fact that it can be "empty" is actually an implementation detail that happens completely under the hood. This is made possible with "Merkle tree identities" that allow us to optimize the storage of tree branches containing default values:

sparse_merkle_tree_identity

Nodes with default value children collapse into a single default value leaf. This demonstrates that an "empty node" is tantamount to a "default value branch" of any arbitrary depth that has default value leaves. By extension, an empty root node is tantamount to a full tree with entirely default value leaves. Using the rule of percolating up the default value, the root can only ever be 0x0000000000000000000000000000000000000000000000000000000000000000. We would never have an "empty tree" in a way that is visible to the end user - again, emptiness is an internal implementation detail. I do not think we will ever find a scenario where the the hash of nothing is a valid root! My suggestion, therefore, is to simply clarify in the spec that an "empty tree" root is really the default value.

Edit: I realized after the fact that the original diagram was an incorrect oversimplification - I updated it to reflect the internal structure more accurately.

adlerjohn commented 2 years ago

Oh we're in agreement 😂

When I said

Both of these cases should be clarified in the specs

I meant exactly that a zero-leaf SMT isn't actually possible in practice, and that's what should be clarified. Note that it's possible to construct a zero-leaf SMT by setting the key length to zero, which the library should support (if it supports setting the key length) and should not panic on. If the library uses 32-byte keys always, then it's impossible to have a zero-leaf SMT.

bvrooman commented 2 years ago

Perfect! 🚀 I'd be happy to put up a PR for the spec clarification changes if there isn't someone assigned to do it already.

adlerjohn commented 2 years ago

Please do

celestiaorg / smt

The root calculated by an empty Sparse Merkle Tree does not match the expected output defined by the specification #67