Finish TSMT - Githubissues

0xPolygonMiden / miden-vm

STARK-based virtual machine

MIT License

630 stars 161 forks source link

Finish TSMT #258

Closed bobbinth closed 1 year ago

bobbinth commented 2 years ago

Assuming we go with the state model similar to the one described in #222, we need to be able to handle sparse Merkle trees (or variations thereof) in the VM. Specifically, we need this to represent account and nullifier databases.

These Merkle trees will act as maps which map ~32 byte keys to ~32 byte values (in our case, both keys and values would actually be 4 field elements). Supporting these trees requires the following:

Implement Miden assembly procedures for reading and updating leaves.
Implement an advice set which represents these Merkle trees in the advice provider.

A few considerations about each of these points.

Miden assembly procedures

We probably need the following procedures:

crypto::smt::get - this procedure should push the value contained at the specified key to the stack. Stack transition for this procedure could look as follows:

[KEY, ROOT, ...] -> [VALUE, KEY, ROOT, ...]

Where ROOT, KEY, and VALUE are represented by 4 field elements.

crypto::smt::set - this procedure should update the the value at the specified key. Stack transition for this procedure could look as follows:

[NEW_VALUE, KEY, ROOT, ...] -> [NEW_ROOT, NEW_VALUE, KEY...]

Similarly to the above, ROOT, NEW_ROOT, KEY, and NEW_VALUE are represented by 4 field elements.

We have MPVERIFY and MRUPDATE operations in the VM which can be executed in a single cycle. I wonder if these can be used in these procedures.

Advice set

To support the above procedures, we need to implement a new advice set. Our current advice set interface expects the key to be a single field element - so, we may have to refactor the interface. For context, advice set interface is used to provide Merkle paths for a given combination of root and node index.

Another important consideration is that we should expect these Merkle trees to contain large amounts of data (e.g., many gigabytes). So, it may be a good idea to use some in-process database to store the actual data.

maxgillett commented 2 years ago

A sketch of a potential implementation for crypto::smt::get that uses the mtree.get operation and is compatible with the u64 SparseMerkleTree advice set is below. Note that many stack manipulation operations are missing from the first two procedures.

The idea is to iterate over each field element limb in KEY and verify the provided paths for each of those limbs. Extension nodes are used to reduce the number of legs that need to be iterated over when verifying a Merkle path. This decreases the number of rows in the hasher chiplet at the cost of increasing the overall cycle count. It could be a good idea to quantify these numbers to better understand whether the tradeoff makes sense.

Within a limb, the mtree.get operation may be called multiple times to skip a compacted path embedded in an extension node. An accumulated key and depth are constructed for each limb to ensure that the appended sequence of non-deterministically provided keys corresponds to KEY.

A new 256-bit sparse Merkle trie advice set would be added that extends the existing u64 advice set with extension nodes.

# Push the leaf value at K to the stack
#
# vars:
# K: trie key
# R: trie root
# V: leaf value
#
# in:  [K, R, ...]
# out: [V, K, R, ...]
proc.get
    # Iterate over each limb of K
    swap
    repeat.4
        exec.get_limb
    end

    # Assert that the accumulated key (a0, a1, a2, a3) matches K
    movupw.3
    dup.1
    eqw
    assert
    dropw
    dropw
end

# R: trie root
# a: accumulated key
# V: leaf value or decoded extension lookup hash
#
# in:  [R, ...]
# out: [V, R, a, ...]
proc.get_limb
    while.true
        # Load path data from the advice provider
        # w: flag set to 1 if retrieved node V is an extension node
        # f: flag set to 1 if no more key paths exist for the limb
        # d: path depth
        # i: path key
        # Stack: [w, f, d, i, R]
        loadw.adv

        # Accumulate path and store on stack
        exec.accumulate_path

        # Retrieve node V
        # Stack: [V, R, ...]
        mtree.get

        # If V is an extension node, decode lookup hash and accumulate path data
        # Stack: [R', ...]
        movup.x
        if.true
            exec.decode_extension_node
            exec.accumulate_path
        end

        # Break if no more key paths exist for the limb (f == 1)
        movup.x
        not
    end

    # Assert that accumulated depth equals limb bit length
end

The decode_extension_node and accumulate_path procedures can look something like this:

# Decode extension node V = hash([0,0,d,k,R])
#
# vars:
# d: decoded depth
# k: decoded key
# R: decoded lookup hash
# V: extension node value
#
# in:  [V, ...]
# out: [d, k, R, V, ...]
proc.decode_extension_node
    loadw.adv
    loadw.adv
    dupw.1
    dupw.1
    rphash
    dupw.2
    eqw
    assert
    drop
    dropw
    dropw
    drop
    drop
end

# Accumulate depth and key of the current path
#
# vars:
# a: accumulated depth
# b: accumulated key
# a': d + a
# b': i*2^(b+d)
#
# in:  [d, i, a, b, ...]
# out: [a', b', ...]
proc.accumulate_path
    movup.2
    add
    dup
    movdn.3
    pow2
    mul
    add
    swap
end

maxgillett commented 2 years ago

To clarify the above, extension nodes can represent either a compacted leaf or branch node, similar to Ethereum's MPT.

For example, a sparse Merkle tree of depth 3 with a single leaf node hash h1 at key 001 (where $e_i$ denotes an empty hash at depth $i$) would be represented by an equivalent tree of depth 1, where the extension node is a single leaf node hash H(2, 01, h1) of the form H(depth, key, leaf_hash): 1 drawio

To represent a tree with leaves at keys 0010 and 0011, we would use an extension node of the form H(depth, key, branch_hash), with branch_hash set to ROOT2 = H(h2, h1), and key set to 01 (the path skipped between layers 1 and 3), resulting in two subtrees with roots ROOT1 and ROOT2: 2 drawio

In the case of a tree storing a single leaf node hash h1 at a 256-bit key (not shown here), there would be four subtrees, each of which contain a single extension node. The first three nodes map each limb key to the root hash of the next limb, and the final node maps the final limb key to h1

The above does not describe the procedure for crypto::smt::set, which will need to handle splitting of extension nodes, for example:

3 drawio

bobbinth commented 2 years ago

Thank you for the explanation! I think I understand the overall construction. A couple of questions:

By looking at a node, how can we tell if this is an extension node or a a leaf node? For example, in your second example, could the prover claim that the tree contains a single node with value ROOT2?
If I'm understanding things correctly, it seems that the number of extension nodes which need to be decoded increases as the sparsity of the tree decreases. I wonder if it is possible to estimate how many extension nodes would need to be decoded, on average, for a path in a tree of a few billion entries.

Overall, so far, we have a few different use cases for SMTs: account vaults and account/nullifier databases. The underlying assumptions for these are slightly different:

For account and nullifier databases we should probably assume that that the number of entries will be in billions.
For account vaults: a. For the fungible asset subtree, the number of entries is unlikely to exceed 1K or so. b. For the non-fungible asset subtree, the number of entries is unlikely to exceed a 100K or so.
For account storage, the number of entries could range from just a few to millions or more.

I wonder if the same construction will work equally well in all cases - or whether we should try to specialize.

bobbinth commented 2 years ago

Another potential approach could be to define a leaf node as: hash(value, remaining_path) and to keep all internal nodes as regular nodes. With this approach, a tree of depth 3 with 2 nodes could be compresses like so:

And if we added another node, it could be compresses like so:

I'm probably missing a few details on how to make it work, but if the approach works, I think we'll never have to decode more than one node. I think the drawback would be that if two nodes happen to share a long common sub-path, the path length could be quite larger.

bobbinth commented 2 years ago

One other thought regarding handling variability: maybe for things like account/nullifier databases we could have a layered approach:

The first layer is a regular (uncompressed) sparse Merkle tree of depth 32 (or something close).
The leaves in this tree would be roots of compressed sparse Merkle trees (or if there is only one account present in a given leaf, then just store the leaf directly).

With the above structure, compressed trees are unlikely to have more than 1K - 10K items in most cases. So, maybe we can use this number of items as the target under which our compacted SMT construction performs the best.

The obvious drawback of the above approach is that even while the number of items in a tree is small, Merkle paths will be at least 32 nodes long. But that might not be a bad thing: it would mean that updating account/nullifier databases would always have a relatively consistent cost, regardless of which account is being updated.

maxgillett commented 2 years ago

By the way, I noticed that there's an error in my second and third example: the depth in the extension node (orange box) should be 2, not 3.

By looking at a node, how can we tell if this is an extension node or a a leaf node? For example, in your second example, could the prover claim that the tree contains a single node with value ROOT2?

The prover will nondeterministically provide this information, but it should be impossible to lie about. We can keep track of the total depth when working through a Merkle proof, which in this example would be 1+2+1=4, and verify it that it matches the tree's depth (which in this example is 4). The same can be done for the accumulated key and claimed key. If in this example the prover claimed that ROOT2 was a leaf node value, then the accumulated depth and key would not match the provided one. Note that we can't only track the remaining path, because then there would be no way to differentiate between a key of 000 and 00. Let me know if my explanation is unclear.

If I'm understanding things correctly, it seems that the number of extension nodes which need to be decoded increases as the sparsity of the tree decreases. I wonder if it is possible to estimate how many extension nodes would need to be decoded, on average, for a path in a tree of a few billion entries.

Yep, I agree that there should be some quantitative numbers here. I'll work on that. For reference in the meantime, Ethereum's MPT for accounts has an average of ~7 nodes (regular or extension) for a given path in a tree storing ~175 million accounts. Of course those keys are composed of nibbles, not bits (i.e. key length of 64 vs 256), so this is just a lower bound.

I wonder if the same construction will work equally well in all cases - or whether we should try to specialize.

Agreed, there needs to be more thinking about this. One point to mention is that there will likely be many more Merkle proofs proving vault/code/storage contents than getting or setting an account hash (at least in the transaction kernel -- I suppose there will be many account proofs processed in the block kernel).

Another potential approach could be to define a leaf node as: hash(value, remaining_path) and to keep all internal nodes as regular nodes.

Yes, I had considered something like this. In the Jellyfish paper they suggest using only this type of compaction, and doing away with extension nodes. It feels like this is more appropriate when the tree is storing a very large number of items, and that there are still likely to be great savings from shared key prefixes for trees storing fewer items. I'll have more insight into this soon from the extension node analysis.

A related point to mention is that such an approach will require working with a remaining_path that no longer fits in a single field element, and might occupy four field elements. I believe we still need to hash the depth along with the value and remaining_path, so the leaf node might need to look like hash(value, hash(remaining_depth, remaining_path)). One could also use a U256 representation to represent internal shortcut paths to cut down on the number of extension nodes needed in a path in my original proposal.

maxgillett commented 2 years ago

The probability that, given an internal node exists in a SMT at the specified layer depth, only a single branch extends from it (as opposed to two branches) is plotted below for varying key counts:

fig

It appears that internal extension nodes would only be useful for compressing up to ~8 layers or so before a terminal extension node would likely be used. This suggests to me that the overhead of managing internal extension nodes is probably not worth it, and we should only compact the final remaining path, but I'd be interested to hear what others have to say.

To summarize the previous comments, assuming we make all internal limb nodes be regular nodes, these are the ideas so far whose stack/hasher row tradeoffs need to be analyzed:

Compact the final remaining path to the leaf node, and choose one of the following (which have different types of overhead):
- A 256-bit representation for this path
- A 64-bit representation for this path (mapping each limb key to the root hash of the next limb, as previously described)
Use a separate representation for accounts, such as the layered approach described by @bobbinth above, to maintain a consistent decoding cost (and lessen the effect of a malicious user grinding neighboring account paths)

bobbinth commented 2 years ago

Thank you! I personally, would favor simplicity of the algorithm over the amount of hashing that needs to be done by the hasher. Here is how I'm thinking about:

Assume that our target use case are data sets with between 1K and 1M items. On average, this means that Merkle path length should be somewhere between 10 and 20. Each hash takes 8 rows in the hasher table - so, verifying such paths translates to 80 - 160 rows.

It would be good to have the number of cycles on the stack side to be roughly half this number (this is just a rule of thumb). So, if we can decode all relevant nodes in 40 - 80 cycles, it would be ideal - and I'm not sure this is achievable with more complicated decoding logic.

I guess to summarize: in my mind, an approach where reading from the tree takes 200 rows in the hasher and 50 cycles in the stack is preferable to the approach which takes 100 rows in the hasher and 100 cycles in the stack.

bobbinth commented 1 year ago

Closed by #1046, #1048, #1036 and other related PRs.