0xPolygonMiden / crypto

Cryptographic primitives used in Polygon Miden rollup
MIT License
97 stars 32 forks source link

Design compact Sparse Merkle tree #22

Closed bobbinth closed 1 year ago

bobbinth commented 1 year ago

The original issue describing the need for compact SMTs is here and the PR which has a partially working implementation is here.

The high-level summary is that we need to support a Sparse Merkle tree which enables keys of size ~256 bits. This SMT would be used in many places in the VM. For example, account storage, account vaults, account database, nullifier database etc.

We need to balance efficiency both in Rust and in Miden assembly. There are many implementations of SMTs out there (e.g., Jellyfish Merkle Tree), however, my intuition is that implementing them will be pretty costly in Miden assembly because of the large number of conditional checks.

So, while this issue is not about Miden assembly, we should probably get a good idea of how the construction we come up with will be implemented in Miden assembly and what the associated costs would be (i.e., in terms of cycle counts for both get and set operations).

One approach is to build a "Tiered tree". In this tree, the first $n$ level (say 21) would be a simple SMT (no compaction) and then starting at the 22nd level, we could do a more sophisticated compaction scheme. Or we could have 2 tiers of 21 levels each and then do more sophisticated compaction (we probably should benchmark these and other approaches.

I think the performance targets could be something like (in an average case):

Al-Kindi-0 commented 1 year ago

To start the discussions, here are some of my thoughts/observations attempting to describe the status quo. In what follows, I will refer to the proposed solution here as compact Miden sparse Merkle tree (cmsmt). The cmsmt proposal, in the spirit of e.g. this and this, allows the simulation of intractable sparse Merkle trees in time that is logarithmic in the actual number of entries as opposed to the maximum number of entries, assuming the keys are uniformly distributed. The cmsmt proposal, in its current non-optimized form, achieves the following approximate cycle counts:

  1. Getting a node: ~70 cycles.
  2. Setting a node: ~120 cycles for an update and ~500 cycles for an insertion.

The above uses the first element of the key-digest as a 63 bits key and thus makes, I believe, the following simplifying assumptions:

  1. The maximal number of leafs that the smt can support is ~ $2^{63}$.
  2. Long prefixes that extend beyond the first element of the key-digest are not possible/very unlikely.

The first assumption, in my opinion, is very reasonable for most of our applications. In fact, this might be an overkill for some cases where we can use a non-compacted smt to achieve a better performance for Miden assembly without sacrificing too much the Rust performance. The second assumption is, however, problematic. Even if we assume that we use key-digests for indexing (i.e. we hash the keys before using them as indexes), a malicious actor might grind in order to find keys that result in common prefixes which are longer than 63 bits. This is feasible, even when accounting for the fact that algebraic hash functions are slow in Rust. Provided we can work with or around the above assumptions, I think that the above costs are within the ranges that @bobbinth referenced above. The main costs are due to bit-wise manipulations and the costs due to hashing are comparatively small since the above implementation makes heavy use of the mtree_get and mtree_set native instructions. Moreover, and as far as I can tell, as long as both of the above assumptions hold, the costs are nearly constant as a function of the number of nodes.

bobbinth commented 1 year ago

Thank you, @Al-Kindi-0! I agree with your assessment of the assumptions. I guess the next questions are:

  1. How can we adapt the structure to support the worst case scenario of keys having common prefixes of more than 63 bits? (hopefully, without sacrificing too much in the average case).
  2. Can we somehow improve the performance of insertions?
  3. Is the construction sound? And does it meet all the usual requirements of an SMT (i.e., order-independence of inserts).
bobbinth commented 1 year ago

Some more thoughts:

Another thing, we should be able to use non-deterministic branching to handle tree depth greater than 63. For example, in case of leaf retrievals, it could work something like this:

# get a value from the advice tape indicating whether depth is greater than 63
adv_push

if.true
    # tree depth is greater than 63. get the next non-deterministic input to determine
    # if the depth is greater than 126
    adv_push
    if.true
        # tree depth is greater than 126
    else
        # push the depth of the node
        push.63

        # then read the node at this depth and tree the result as the root of
        # another tree of depth less than 64.
    end
else
   # tree depth is smaller than 63; use regular procedure to retrieve a leaf node
   # based on the first element of the key.
end
Al-Kindi-0 commented 1 year ago
1. How can we adapt the structure to support the worst case scenario of keys having common prefixes of more than 63 bits?  (hopefully, without sacrificing too much in the average case).

I think that the construction, using non-determinism (ND), that you mentioned above should work. We should get the node at depth 63 and then treat that as a new root for the base procedure.

  1. Can we somehow improve the performance of insertions?

I believe yes. First of all, the existing code uses a lot of unnecessary stack manipulations which can be optimized out. Second, the bit-wise manipulations can also (possibly) be avoided using division by powers of 2. I think this should be cheaper(?).

  1. Is the construction sound? And does it meet all the usual requirements of an SMT (i.e., order-independence of inserts). To provide a definitive answer, we will have to dig a bit deeper. First of all, there are some mistakes and omissions in the CMSMT implementation but these are not very serious in my opinion. Second, there is an inconsistency in the JSMT paper or in my own understanding of it. Specifically, on page 3, optimization number 2 states that it "replaces subtrees consisting of exactly one leaf with a single node". However, on page 7 the sentence "...cascading internal nodes will be created one by one..." seems to contradict the previous optimization. In my mind, a CSMT should have two types of nodes; either a leaf node or a an internal/branch node where the latter type has two and exactly two children. So, inserting a single new (key,value) shouldn't create a cascading series of internal nodes that necessarily have only one child (except for the last one). This seems to be the same thing happening in the CMSMT here. This might be one of the quirks of higher than 2-arity of the implementation but I am doubtful of this.
Al-Kindi-0 commented 1 year ago

Some more thoughts:

* Making insertions efficient is pretty important. One of the use cases for these is building unified transaction vaults during transaction prologue and epilogue. Another use case is insertion of nullifiers.

I agree, and as stated above, we should be able to improve the cycle count already with only some better stack management.

* I think we should be able to do updates in the same (or very similar) number of cycles as node retrievals. Number of cycles here refers to the number of cycles for the stack. Updates will require 2x the number of rows in the hasher.

This sounds logical, an update is a get plus double the amount of hashing.

* One change that I'd prefer to make to the existing construction is to validate nodes on retrievals too. This could take some number of extra cycles, but I think as long as it is not too many cycles, it should be OK.

What is the meaning of "validate" in the above sentence?

* One potential way to reduce the amount of binary operations is to use tired trees (i.e., in increments of 216). I think this way, the logic for computing common prefixes should be much simpler.

Wouldn't it make sense to use a non-compact SMT for such shallow trees?

Another thing, we should be able to use non-deterministic branching to handle tree depth greater than 63. For example, in case of leaf retrievals, it could work something like this:

# get a value from the advice tape indicating whether depth is greater than 63
adv_push

if.true
    # tree depth is greater than 63. get the next non-deterministic input to determine
    # if the depth is greater than 126
    adv_push
    if.true
        # tree depth is greater than 126
    else
        # push the depth of the node
        push.63

        # then read the node at this depth and tree the result as the root of
        # another tree of depth less than 64.
    end
else
   # tree depth is smaller than 63; use regular procedure to retrieve a leaf node
   # based on the first element of the key.
end

Yes, this should work and probably we need only worry (from a cost perspective) about the case of greater than 63 provided the keys are digests themselves. The cost I believe should be those of a get + the costs of the main operation + some hashing and stack manipulations.

bobbinth commented 1 year ago

What is the meaning of "validate" in the above sentence?

I think that rather than using a full key, we should hash the remaining key with the node value for leaf nodes. This way, even when retrieving a node, we can be sure that the node was retrieved from the right location.

For example, we could define leaf hash as:

hash(leaf_value, remaining_path)
bobbinth commented 1 year ago

One thing I was thinking about is whether we actually need to use trees beyond depth 63. An alternative approach could be to use a simple sorted list were leaves are sorted by their keys. Basically, we still have a compact SMT for the first 63 levels, but if a leaf ends up on the 64th level, we use a sorted list to store all such leaves. Modifying this list may require re-sorting, but it may be simpler than dealing with a potential forest of trees.

Al-Kindi-0 commented 1 year ago

We start first by describing a data structure implementing the key-value map in the most compact way possible. In this data structure,

  1. A tree is either Empty or Node
  2. A Node is either Leaf or Inner
  3. A Leaf has two fields: Key, Value
  4. An Inner has the following fields: Prefix, Left(Node), Right(Node) where Prefix represents the longest common prefix of all children in the sub-tree.

In this construction, it is easy to see that each internal node has exactly two children.

Operations on this key-value map can be all done using the following core routine:

descend(tree, prefix):
    current = tree
    if current.prefix.len() <=  prefix.len() and  prefix.starts_with(current.prefix):
        if  prefix[current.prefix.len()]:
            yield from descend(current.left, prefix)
            yield current.right
        else:
            yield from descend(current.right, prefix)
            yield current.left
    else:
        yield current

Using descend we can express e.g. get and put as:

get(tree, key):
    siblings = descend(tree, key)
    closest = siblings.next()
    return closest

put(tree, key, value):
    node = node(key, value)
    siblings = descend(tree, key)

    closest = siblings.next()
    if closest = Empty:
        # new tip
        return node
    else:
        new_tree = Node::new_inner_node(closest, node) # which one is left or right is decided by the constructor based on the prefixes
        for sibling in siblings:
            new_tree = Node::new_inner_node(sibling, new_tree)
        return new_tree

or, in order to emphasis the recursive structure aspect of the problem, as

put(tree, key, value):
    node = node(key, value)
    siblings = descend(tree, key)
    tree = ascend(siblings, node)
    return tree

The same goes for the remove procedure i.e. one just finds the closest sibling to the node in question and replaces the common parent with the closest sibling.

If we forget for a moment the problem of authentication, then what we have so far is a key-value map with some methods/procedures associated with it. If we now move to the situation of a weak client interacting with a key-value map hosted by a powerful server, then what we need is some modification of these methods so that the client, starting from some root of trust, can make requests to the server that are answered with a tuple of (answer, proof) such that the client can be convinced that that answer is correct by running some computations on (query, answer, proof).

To illustrate the above, take put as an example. The server will take the put procedure and transform it in such a way so as to send for every access to a (sub-)tree (prefix, left, right) a shallow copy[^1] (prefix, h(left), h(right)) to the client while for a (in fact the) leaf the shallow copy will be (h(key), h(value)). h is a cryptographic hash function. The client will take all the received shallow copies and will just run again put but using the hashes instead of the pointers to the left/right sub-trees. More precisely, the client will start from the root of trust being the hash of the current tree root. it will then make sure the server executed siblings = descend(tree, key) correctly by hashing the messages it received from the server i.e. (prefix,h(left),h(right)) and comparing the result with hash(left)/hash(right) depending on the key and prefix in a similar manner to descend (the fact that again here the client runs a very similar code to the original one is not a coincidence, see further down). Next, the client will make sure tree = ascend(siblings, node) was executed correctly by mimicking ascend using, again, (prefix,h(left),h(right)) as a substitute for siblings and finally conclude by comparing with the new tip. The same goes for get (which works both for proving membership and non-membership) as well as remove. It should be noted that the above can be improved in at least two ways:

  1. The authentications sent by the server to the client can be batched together into something akin to a proof and sent in one go.
  2. The proof can be pruned in order to remove redundancies.

The above implementation was suggested in this ethresear.ch post by Markus Knecht as a simplification of this equivalent construction by Faraz Haider. Both constructions come, however, (much) later than this beautiful implementation (which is the basis of the discussion above) by Peter Todd. This later implementation realizes, in a practical way, the idea of Abstract Data Structures (ADS) in a generic way as laid out in the seminal paper by Miller et al.. This paper gives a systematic way of transforming any code implementing any data structure into two versions, one that can be run by the server and another by the client with formal guarantees that the resulting protocol between server (Prover) and client (Verifier) is complete and sound (up to finding a hash collision). It is worth noting that, obviously, the idea of ADS pre-dates the aforementioned paper. However, this paper gives a general framework for designing ADS in a systematic way whether one uses the code transformation approach at the compiler level (as the paper does) or at a library level (as @ekmett and others do in e.g. here and here). In my opinion the approach is useful even on its own, as ideas such as checkpoints/shallow copies can be used to think and design new ADS as can be seen (hopefully) from the current post. In the case of Miden, the VM can be thought of as the client that gets the proof (non-deterministically) from a server and runs its version of the code on (the shallow copy of the) database. So in the example of put, the VM authenticates the closest sibling of the node we are inserting using the current root of the tree as is done in descend and then computes the new root using ascend (I am assuming for simplicity this is not an update). This is in fact the approach taken here although a bit different since in this implementation the internal nodes don't have the prefix field and thus the algorithms are a bit more complex. This, however, has the advantage, over the implementation in this post, of not needing to hash the prefixes as the order of hashing (left , right) plays the role of prefix-hashing. Note also that in this construction some of the internal nodes might have one of its two children be an empty sub-tree. The optimization of using one default digest for all empty sub-trees should still work fine, I believe. Another solution might be to do compaction only starting from a certain depth. This will mean that internal nodes up to a certain depth consist only of a left and right child internal node (of the same type) and then starting from the given depth, internal nodes can have the prefix property to denote the longest common prefix of any sub-tree. This will mean that the number of hashes will always have a fixed cost proportional to depth but this may (or may not!) be compensated for with the simplicity of the algorithm and the avoidance of needing to hash the prefix for the first depth layers.

[^1]: This term is from http://www.cs.umd.edu/~mwh/papers/gpads.pdf.

Al-Kindi-0 commented 1 year ago

One thing I was thinking about is whether we actually need to use trees beyond depth 63. An alternative approach could be to use a simple sorted list were leaves are sorted by their keys. Basically, we still have a compact SMT for the first 63 levels, but if a leaf ends up on the 64th level, we use a sorted list to store all such leaves. Modifying this list may require re-sorting, but it may be simpler than dealing with a potential forest of trees.

I don't think this is needed given the above, but I might be wrong. Also, I think that the remainder might change after the corresponding prefix changes in the case of some insertions.

Al-Kindi-0 commented 1 year ago

One thing I was thinking about is whether we actually need to use trees beyond depth 63. An alternative approach could be to use a simple sorted list were leaves are sorted by their keys. Basically, we still have a compact SMT for the first 63 levels, but if a leaf ends up on the 64th level, we use a sorted list to store all such leaves. Modifying this list may require re-sorting, but it may be simpler than dealing with a potential forest of trees.

I really like this idea! This means that all leaves that have the same 63-bit prefix will be stored in one leaf as an ordered set (?). Then everything will work as expected except that we would need to include the logic for working with the ordered set in case of a "prefix collision". I believe also that all the members of the set will have to be included in order to authenticate any member of the set.

bobbinth commented 1 year ago

This means that all leaves that have the same 63-bit prefix will be stored in one leaf as an ordered set (?)

Yes, exactly right!

I believe also that all the members of the set will have to be included in order to authenticate any member of the set.

This is correct - but I don't think this is a big deal. We should expect some small number of collisions after $2^{32}$ values are added to the tree - but I think it is quite unlikely that more than a handful of values end up in a single leaf.

Also, it is possible to intentionally come up with a collision doing just $2^{32}$ work, but finding a 3rd, 4th etc. value which will end up in the same leaf would require $2^{64}$ work - which is rather costly, especially given that the benefit of doing this work is rather questionable.

bobbinth commented 1 year ago

Wanted to summarize use cases for SMTs I see currently. I think this may help us settle on a specific construction.

  1. Account database a. Expected size: between $2^{20}$ and $2^{40}$. b. Most common operation: updating existing nodes (there will be insertions too, but probably negligible compared to the number of updates); no deletions.
  2. Nullifier database a. Expected size: between $2^{20}$ and $2^{40}$. b. Most common operation: inserting new nodes (there will be no updates or deletions).
  3. Account storage a. Expected size: in most cases smaller than $2^{16}$ but in some extreme cases could be larger. Unlikely to be greater than $2^{32}$ though. b. Most common operation: a mix of updates and insertions; probably some deletions too.
  4. Account vault a. Expected size: in most cases smaller than $2^{16}$ but in some extreme cases could be larger. Unlikely to be greater than $2^{32}$ though. b. Most common operation: a mix of updates and insertions; probably some deletions too.
  5. Unified tx vault a. Expected size: in most cases smaller than $2^{16}$ but in some extreme cases could be larger. Unlikely to be greater than $2^{32}$ though. b. Most common operation: a mix of updates and insertions; no deletions.
bobbinth commented 1 year ago

One idea that I've been thinking about is "tiered SMT". The idea is that we allow leaves only at certain levels of the tree. A in illustration (not drawn to scale) of a potential instantiation is below.

image

In the above:

Positives of this approach:

Negatives of this approach:

bobbinth commented 1 year ago

After chatting with @aszepieniec, it seems I made a mistake: the work required to find 3 or more 64-bit collisions is not $2^{64}$ but is much lower. So, for example, finding a 3-collision roughly requires $2^{41}$ work, and 4-collision requires roughly $2^{48}$ work. And we get to roughly $2^{60}$ work only for 18-collision.

Given the above, I wonder if using leaves with sorted sets at depth 64 is still a good idea (it still might be, but I think we need to so much more careful analysis).

Al-Kindi-0 commented 1 year ago

Insertion in the construction proposed by @bobbinth can be implemented in Miden VM using non-determinism in for example the following manner:

adv_push

if.true                 # tree depth is > 16
    adv_push
    if.true             # tree depth is > 32
        adv_push
        if.true         # tree depth is > 48
            adv_push
            if.true     # tree depth is > 64
                # TODO

            else        # tree depth is 64
                # We have to:
                # 1- Authenticate the root of T16 against the root of T0
                # 2- Assert root T16 != default
                # 3- Authenticate root of T32 against root of T16
                # 4- Assert root T32 != default
                # 5- Authenticate root of T48 against root of T32
                # 6- Assert root T48 != default
                # 7- Authenticate default leaf against root of T48
                # 8- Insert leaf at T48 and get new root of T48
                # 9- Insert new root of T48 into T32 and get new T32 root
                # 10-Insert new T32 into T16 and get new T16 root
                # 12-Insert new T16 root into T0 and get new and resulting T0 root
                exec.insert64

        else            # tree depth is 48
            # We have to:
            # 1- Authenticate the root of T16 against the root of T0
            # 2- Assert root T16 != default
            # 3- Authenticate root of T32 against root of T16
            # 4- Assert root T32 != default
            # 5- Authenticate default leaf against root T32
            # 6- Insert leaf at T32 and get new root of T32
            # 7- Insert new root of T32 into T16 and get new T16 root
            # 8- Insert new T16 root into T0 and get new and resulting T0 root
            exec.insert48

    else                # tree depth is 32
        # We have to:
        # 1- Authenticate the root of T16 against the root of T0
        # 2- Assert root of T16 != default
        # 3- Authenticate default against T16
        # 3- Insert new leaf at the correct position at T16 and get new root
        # 4- Insert new root in T0
        exec.insert32

else                    # tree depth is 16
    # We have to:
    # 1- Authenticate the default leaf against the current root
    # 2- Insert new leaf at the correct position
    exec.insert16

Insertion at depth 16 can be implemented for example like this:

# Insert at depth 16 the leaf h([K,V]) in tree with root R (~135 cycles)

# Input: [R, K, V, ...]
# Output:[R', K', V,...]
proc.insert16

    # Decompose k0 into 16bit chunks and store decomposition in memory (18 cycles)
    # [a3, a2, a1, a0, R, K, V,... ] where k0 = (((a3 * 2^16 + a2) * 2^16 + a1) * 2^16 + a0
    movup.7
    exec.split64
    loc_storew.1

    # Get the leaf node at index a0 and check it is the default leaf node D i.e. H == D (37 cycles)
    # [H, R, K, V]
    drop drop drop
    push.16
    mtree_get
    push.0.0.0.0
    eqw
    assert
    dropw

    # Create new K' from K by mapping k0 to k0' := (((a3 * 2^16 + a2) * 2^16 + a1) * 2^16 + 16 (~80 cycles)
    # and hash to get the leaf
    # [R, a0, H', ...]
    loc_loadw.1
    push.16         #[a3, a2, a1, 16, a0, R, K, V, ...]
    movdn.3
    repeat.3
        mul.65536
        add
    end             #[k0', a0, R, K, V, ...]
    push.0.0        #[0, 0, k0', a0, R, K, V, ...] 
    movupw.2        #[K, 0, 0, k0', a0, R, V, ...]
    movup.6
    swap.4 drop     #[K', 0, 0, k0', a0, R, V, ...]
    movupw.3        #[V, K', 0, 0, k0', a0, R, ...] 
    rphash          #[H', 0, 0, k0', a0, R, ...]
    swapw.2         #[R, 0, 0, k0', a0, H', ...]
    repeat.3
        movup.4
        drop
    end             #[R, a0, H', ...]

    # Insert the leaf at index a0 
    # [R', H', ...]
    movup.4
    push.16         #[16, a0, R, H', ...]
    mtree_set       #[R', H', ...]
end

This is relatively straightforward and can be made very efficient. As @bobbinth mentioned above, this the most common insertion case that one would encounter and thus optimizing this should be a priority. Insertions at deeper tiers should be an uncommon which is good since it is more complicated than the previous case. Here is a proposal of insertion at depth 32:

# Insert at depth 32 the leaf h([K,V]) in tree with root R (~211 cycles)

# Input: [R, K, V, ...]
# Output:[R',...]
proc.insert32

    # Save V in memory (9 cycles)
    # [K, R, ...]
    swapw.2
    loc_storew.3
    dropw

    # Save K in memory and leave only k0 on stack (7 cycles)
    # [k0, R, ...]
    loc_storew.1
    drop drop drop

    # Decompose k0 and save into memory (16 cycles)
    # [a3, a2, a1, a0, R, ...] where k0 = (((a3 * 2^16 + a2) * 2^16 + a1) * 2^16 + a0
    exec.split64
    loc_storew.2

    # Get leaf from tree with root R and make sure it is not default (33 cycles)
    # [0, 0, 0, 0, R16, R, ...]
    drop drop drop
    push.16
    mtree_get
    push.0.0.0.0
    eqw
    assertz

    # Get leaf from tree with root R16 and make sure it is default (37 cycles)
    # [0, 0, 0, 0, 0, 0, 0, 0, R16, R, ...]
    loc_loadw.2
    drop drop swap drop
    push.16
    mtree_get
    push.0.0.0.0
    eqw
    assert

    # Compute new k0  (23 cycles)
    # [k3, k2, k1, k0', 0, 0, 0, 0, R16, R, ...]
    loc_loadw.2
    mul.65536
    add
    mul.4294967296
    add.32
    movdn.2
    push.0.0
    loc_loadw.1
    movup.4
    swap.4
    drop

    # Compute the leaf as H' = hash([K',V]) where K' = [k3, k2, k1, k0'] (22 cycles)
    # [R16, H', R, ...]
    swapw
    loc_loadw.3
    rphash
    swapw

    # Insert H' and get the new R16' (33 cycles)
    # [R, R16', ...]
    push.0.0.0.0
    loc_loadw.2
    drop drop swap drop
    push.32             # [32, a1, R16, H', R, ...]
    mtree_set
    swapw dropw
    swapw

    # Insert R16' and get new R' (31 cycles)
    # [R', ...]
    push.0.0.0.0
    loc_loadw.2
    drop drop drop
    push.16             # [16, a0, R, R16', ...]
    mtree_set
    swapw dropw

end

## Helper procedures

# Decomposes a u32 into two u16 limbs (4 cycles)
# In: [a, ...]
# Out:[a_hi, a_lo, ...]
proc.u16split           
    push.65536
    u32checked_divmod
    swap
end

# Decomposes a field element into four u16 limbs (12 cycles)
# In: [a, ...]
# Out:[a_hi_hi, a_hi_lo, a_lo_hi, a_lo_lo ...]
proc.u64split
    u32split
    exec.u16split
    movup.2
    exec.u16split
    movup.3
    movup.3
end

Concerning the other operations that a key-value map should have:

  1. Updates should be (almost) exactly the same as insertions with the main difference being that the old value is provided non-deterministically so that it can be authenticated (i.e. make sure it exists and it is inserted in the right place) before updating it.

  2. Deletions are a bit more trick in such a SMT construction and we should discuss and think about the different possible approaches e.g. just replacing the leaf with a default value vs collapsing tiers in case the removed leaf is the last remaining one in the (sub-)tree.

bobbinth commented 1 year ago

@Al-Kindi-0 - this is great! Thank you! A few thoughts from my end:

First, I wonder if we can handle different depth a bit more dynamically. I think we might be able to get away with only depth cases < 16, 16 ... 64, and > 64. For example, we could add a decorator (or even a couple of decorators) which would look up info of a leaf for the SMT and put it onto the advice tape. This info could include depth, prefix, and remainder. And we can then verify that the data was provided correctly.

Second, if it makes things simpler, we can modify semantics of such instructions as mtree_set. Specifically, currently this instruction does not put the old value of the leaf onto the stack, and I think because of that, we sometimes need to do an mtree_get and then an mtree_set. But it is actually not that difficult to make it so that the old value is left on the stack after mtree_set is executed.

I would also probably change insert into set so that we could handle both updates and insertions. This could work pretty well for depth 16, and maybe for other depths too. So, for depth 16, the procedure could look like this:

# Set value of leaf located at index K to V for a tree with root R (the leaf is expected to be at depth 16).
# we also refer to individual elements of a word by their index (i.e., K is [k0, k1, k2, k3]).

# Input: [K, V, R, ...];
# Output:[V_old, R', ...]
proc.set16
  # split k3 into top 16 bits and low 48 bits: 8 cycles
  # [p, r, k2, k1, k0, V, R, ... ]
  u32split
  push.2^16
  u32div
  push.2^32
  mul
  movup.2
  add
  swap

  # compute the new leaf value H = hash([k0, k1, k2, r + 2^48 * 2^16], V): 19 cycles
  # [p, H, R, ... ]
  movup.8
  push.2^52
  add
  push.0.0.0.0
  rphash
  dropw
  swapw
  dropw
  movup.4

  # update the leaf at index p and depth 16 to H: 15 cycles
  # this assumes that mtree_set semantics have been modified
  # [V_old, R', ... ]
  push.16
  mtree_set

  # make sure that V_old was either an internal node or a hash of a leaf
  # [V_old, R', ... ]
  adv_push
  if.true
    # V_old is an internal node: 19 cycles
    push.0xabcde # this is the value of an internal node at depth 16
    movup.4
    assert_eq
    movup.3
    assert_eq
    movup.2
    assert_eq
    assert_eq
    push.0.0.0.0    
  else
    # V_old is a leaf: 45 cycles
    adv_keyval
    push.0.0.0.0
    adv_push.8

    # make sure the upper 16 bits of k3 are set to 2^52; this could also be done differently
    dup.4
    u32split
    push.2^16
    u32div
    drop
    push.2^52
    assert_eq
    drop

    # compute the hash of the read-in values and make sure it is equal to V_old
    rpperm
    dropw
    swapw
    dropw
    repeat.4
      dup.4
      assert_eq
    end
end

I didn't actually debug the above procedure to make sure it works - but hopefully, I didn't miss anything major. If it does work, the number of cycles would be 62 in case of an insert and 88 in case of an update.

bobbinth commented 1 year ago

One of the things we need to decide is what is the best way to differentiate between a leaf node and an internal node. I am not sure that the scheme I proposed above works. But here is another potential approach:

  1. Internal nodes are hashed as usual $hash(left, right)$. We need it to be this way to make sure we can use mtree_get and mtree_set instructions.
  2. Denoting $r$ as the remaining leaf path from a given depth, and $v$ as node value, leaf nodes could be hashed as $hash(r, v)$ but we also set one of the capacity registers to the value of node's depth. Since the depth can have only 1 of 4 possible values, I don't think this reduces security too much (maybe by a couple of bits?) - but this guarantees that every node will have a unique hash.

This approach should also be slightly more efficient as it gets rid of needing to calculate $d \cdot 2^{48} + r$.

bobbinth commented 1 year ago

A few more thoughts:

Al-Kindi-0 commented 1 year ago

This is great, @bobbinth !

A few comments:

  1. It seems that you are reading the key starting from k3 while I did starting from k0. I don't see any reasons why one vs the other but curious to know what you think.
  2. Regarding the default values, I made, without thinking too much, the simplifying assumption that the root of any empty sub-tree is [0,0,0,0] but am I understanding it correctly that you are using the roots of empty fully materialized sub-trees? I actually like this latter approach better and I don't see any immediate benefits of the former in our current tired construction.
  3. As for domain separation, I think you proposal should be sufficient and it is indeed important to create a separation between leaf and internal nodes.
Al-Kindi-0 commented 1 year ago

A leaf node. The insertion here would be more complicated because we also need to move the conflicting leaf to the next level.

I actually missed this case in my pseudo-code above. This would indeed be complicated but should be feasible, I am think of the following:

  1. Insert the already-exisiting node in a totally empty sub-tree. I think this maneuver is going to be very common and useful that we might need a specialized routine for it.
  2. Take the resulting Root and insert the leaf that initiated the action into the tree of depth 16 with Root.
  3. Take the resulting Root' and insert it in the tier above.
bobbinth commented 1 year ago
  1. It seems that you are reading the key starting from k3 while I did starting from k0. I don't see any reasons why one vs the other but curious to know what you think.

My reasoning for this is that we look at a single element, the most significant bits are the ones which correspond to the top segment of the path. I applied the same logic to the entire word assuming that the most significant element of the word is the one which is "closer to the top of the tree".

  1. Regarding the default values, I made, without thinking too much, the simplifying assumption that the root of any empty sub-tree is [0,0,0,0] but am I understanding it correctly that you are using the roots of empty fully materialized sub-trees?

Yes, this is correct. The reason I did it this way is because that's how the simple SMT we have implemented works and it also makes it indistinguishable from a fully-materialized tree with all ZERO leaves (which could be convenient).

This would indeed be complicated but should be feasible, I am think of the following:

This would work, but there is an extra complication that we don't know how many levels we need to move down. For example, if we are trying to insert at level 32 and the relevant node at level 16 is occupied by a leaf, it may be that we actually need to move to level 48 or 64 because both leaves share a long key prefix. My guess is that this type of insertion is going by far the most complex one. I'm hoping that the tiered tree approach would reduce the probability of these cases.

Al-Kindi-0 commented 1 year ago

@Al-Kindi-0 - this is great! Thank you! A few thoughts from my end:

First, I wonder if we can handle different depth a bit more dynamically. I think we might be able to get away with only depth cases < 16, 16 ... 64, and > 64. For example, we could add a decorator (or even a couple of decorators) which would look up info of a leaf for the SMT and put it onto the advice tape. This info could include depth, prefix, and remainder. And we can then verify that the data was provided correctly.

Second, if it makes things simpler, we can modify semantics of such instructions as mtree_set. Specifically, currently this instruction does not put the old value of the leaf onto the stack, and I think because of that, we sometimes need to do an mtree_get and then an mtree_set. But it is actually not that difficult to make it so that the old value is left on the stack after mtree_set is executed.

I would also probably change insert into set so that we could handle both updates and insertions. This could work pretty well for depth 16, and maybe for other depths too. So, for depth 16, the procedure could look like this:

# Set value of leaf located at index K to V for a tree with root R (the leaf is expected to be at depth 16).
# we also refer to individual elements of a word by their index (i.e., K is [k0, k1, k2, k3]).

# Input: [K, V, R, ...];
# Output:[V_old, R', ...]
proc.set16
  # split k3 into top 16 bits and low 48 bits: 8 cycles
  # [p, r, k2, k1, k0, V, R, ... ]
  u32split
  push.2^16
  u32div
  push.2^32
  mul
  movup.2
  add
  swap

  # compute the new leaf value H = hash([k0, k1, k2, r + 2^48 * 2^16], V): 19 cycles
  # [p, H, R, ... ]
  movup.8
  push.2^52
  add
  push.0.0.0.0
  rphash
  dropw
  swapw
  dropw
  movup.4

  # update the leaf at index p and depth 16 to H: 15 cycles
  # this assumes that mtree_set semantics have been modified
  # [V_old, R', ... ]
  push.16
  mtree_set

  # make sure that V_old was either an internal node or a hash of a leaf
  # [V_old, R', ... ]
  adv_push
  if.true
    # V_old is an internal node: 19 cycles
    push.0xabcde # this is the value of an internal node at depth 16
    movup.4
    assert_eq
    movup.3
    assert_eq
    movup.2
    assert_eq
    assert_eq
    push.0.0.0.0    
  else
    # V_old is a leaf: 45 cycles
    adv_keyval
    push.0.0.0.0
    adv_push.8

    # make sure the upper 16 bits of k3 are set to 2^52; this could also be done differently
    dup.4
    u32split
    push.2^16
    u32div
    drop
    push.2^52
    assert_eq
    drop

    # compute the hash of the read-in values and make sure it is equal to V_old
    rpperm
    dropw
    swapw
    dropw
    repeat.4
      dup.4
      assert_eq
    end
end

I didn't actually debug the above procedure to make sure it works - but hopefully, I didn't miss anything major. If it does work, the number of cycles would be 62 in case of an insert and 88 in case of an update.

@bobbinth , how is the case of insertion of a key that shares the 16 bit and only the 16 bit prefix with a key that already exists in the SMT handled in the above? I am thinking that it should fail and the correct procedure that should be called is something like proc.set32. Hence, we probably need to check that the next 16 bits are different as part of proc.set16. Or maybe I am missing something?

bobbinth commented 1 year ago

I am thinking that it should fail and the correct procedure that should be called is something like proc.set32. Hence, we probably need to check that the next 16 bits are different as part of proc.set16. Or maybe I am missing something?

Yes, you are correct. This case would be handled by proc.set32, but we do need to have this extra check here to ensure that only the first 16 bits of the path are different.

Al-Kindi-0 commented 1 year ago

I am thinking that it should fail and the correct procedure that should be called is something like proc.set32. Hence, we probably need to check that the next 16 bits are different as part of proc.set16. Or maybe I am missing something?

Yes, you are correct. This case would be handled by proc.set32, but we do need to have this extra check here to ensure that only the first 16 bits of the path are different.

In other words, the second if-branch of proc.set16 is an update?

Al-Kindi-0 commented 1 year ago

Here is the procedure proc.set16 with some modifications in order for it to work with the current implementation of mtree_set. Some typos were also fixed but otherwise it is the same as the one proposed by @bobbinth.

# Set value of leaf located at index K to V for a tree with root R (the operation is expected to be at depth 16).
# we also refer to individual elements of a word by their index (i.e., K is [k0, k1, k2, k3]).

# Input: [K, V, R, ...];
# Output:[V_old, R', ...]
proc.set16
  # split k3 into top 16 bits and low 48 bits: 11 cycles
  # [r, k2, k1, k0, V, p, R, ... ]
  u32split          # [k3_h, b * 2^16 + a, k2, k1, k0, V, R]
  push.2^16         
  u32checked_divmod             # [c, p, b * 2^16 + a, k2, ...] where k3 = p * 2^48 + c * 2^32 + b * 2^16 + a
  mul.2^32         
  movup.2           
  add
  swap              # [p, c * 2^32 + b * 2^16 + a, k2, ...] where k3 = p * 2^48 + c * 2^32 + b * 2^16 + a
  movdn.8

  # Save [r + 2^52, k2, k1, k0] to memory for use in the case of a leaf node updated (6 cycles)
  push.4503599627370496
  add
  loc_storew.0

  # compute the new leaf value H = hash([k0, k1, k2, r + 2^48 * 16], V): 29 cycles
  # [H, p, R, ... ]
  push.0.0.0.0
  rphash
  dropw
  swapw
  dropw

  # Save H : 10 cycles
  # [p, R, p, ... ]
  loc_storew.1
  dropw
  dup
  movdn.5

  # Get node at position: 11 cycles
  # [H_old, R, p, ... ]
  push.16
  mtree_get
  swapw

  # Make sure that H_old was either an internal node or a hash of a leaf
  # [H_old, R, p, ... ]
  adv_push
  if.true
    # H_old is an internal node: 40 cycles
    push.0xabcde # this is the value of an internal node at depth 16
    movup.4
    assert_eq
    movup.3
    assert_eq
    movup.2
    assert_eq
    assert_eq           #[R, p, ...]
    push.0.0.0.0
    loc.loadw.1         #[H, R, p, ...]
    movup.8
    push.16
    mtree_set           #[R', H, ...]
    swapw
    dropw
    push.0.0.0.0    
  else
    # H_old is a leaf we need to load the Key from memory and hash with old value to compare with H_old: 84 cycles

    # Load V_old non-deterministically
    adv_keyval
    adv_push.4

    # Store V_old
    loc_storew.3

    # Load  [k0, k1, k2, r + 2^48 * 16] from memory and prepare for hashing
    push.0.0.0.0
    loc_loadw.0
    push.0.0.0.0

    # compute the hash of the read-in values and make sure it is equal to H_old
    # [R, p, ...]
    rpperm
    dropw
    swapw
    dropw
    repeat.4
      dup.4
      assert_eq
    end

    # Load V from memory and insert it into tree
    # [R', H, ...]
    push.0.0.0.0
    loc_loadw.1
    swapw
    movup.8
    push.16
    mtree_set

    # Prepare stack for final configuration
    # [V_old, R']
    swapw
    loc_loadw.3
end

The total cycle count for an insertion is 107 and an update is 151 in the above. These should be improved significantly once we change mtree_set.

Al-Kindi-0 commented 1 year ago

We now have an idea of how simple insertions/updates (i.e. in the case of no prefix collision) work. It will be great to get a feeling of the more involved cases i.e. when there are leaf nodes that share a prefix with the node we are currently trying to insert. In what follows, I will give a high-level overview of the way we can handle these more difficult cases in the case of the tiered SMT as described above: Let R0 be the current tree commitment. Let [k3, k2, k1, k0] = K with k3 = r3 * 2^48 + r2 * 2^32 + r1 * 2^16 + r0 and V the value to insert. Then if the node at index r3 is empty, insertion reduces to the set16 routine. Else, we have two situations:

  1. If it is a leaf node L = hash([K', V']), then:

    1. Either, we are in an update situation and we have to check that the key matches and update the value.
    2. Or, the current leaf node L has to descend to either level 32, 48 or 64 depending on the length of the common prefix. For example if k1 == k1' and k2 == k2' then we will insert both (K, V) and (K', V') into an empty Merkle tree of depth 16 to get a root of type R48 which is then inserted at depth 48.
  2. Else, it is an internal root node at depth 16, say R16. In this case we look at the node with index r2 and root R16 and repeat the same analysis as in point 1 above.

This seems to be feasible but is a bit complicated by the fact that we have 3 types of nodes and we need to keep track of the Merkle tree roots at potentially all the levels 16, 32, 48 in order to update the root R0 of the whole tree. It seems appropriate now to compare the insertion in these complex cases against the fully compact SMT. In the case of CSMT, the situation is similar but simpler since there are only two types of nodes: leaf nodes and internal nodes. More specifically, let (K, V) be the key-value pair we are updating. Following the key K, we will either end up in:

  1. Empty node and in this case the insertion is simple.
  2. A leaf node L and we either:
    1. Update the current value associated with L with the new value V.
    2. Move L as many levels down as the length of the common prefix of the key associated with L and K. Then insert (K, V) as a sibling to L. The following figure illustrates this point: Untitled-2022-09-14-2115

A Middle Ground?

It seems that we might combine the two approaches in order to reach the following two disederata:

  1. Easy key arithmetic and thus lower stack manipulation complexity through compaction at only specific levels (Tiered approach).
  2. Simpler algorithms, at least for insertions/updates but I believe for others too, due to the existence of only two types of keys (CSMT).

In the CSMT approach, we are always working with the same tree but we view it through different lenses by changing its depth when we need to descend in point 2.2 above. This means that we can use the mtree_get and mtree_set instructions with the same root R0 to do 2.2 after modifying the advice provider to change the depth. This is what following line here does. I think this is fine from a soundness perspective but it would be good to have other opinions on this.

Illustrative Example

The following drawing shows the case of a non-trivial insertion when the full keys are of length 4 bits and compaction is happening only at levels 2 and 4: Untitled-2022-09-14-2115(1)

Note that null should be replaced with the hash of the appropriate empty sub-tree.

Al-Kindi-0 commented 1 year ago

Now that https://github.com/0xPolygonMiden/miden-vm/pull/652 is implemented, we can finalize the design by looking at how insertions, updates and removals can be done in the latest version of the tiered-SMT. In principle, updates can be subsumed under insertions but separating the two is better since the logic for updates is simpler than that of insertions and thus the cost in VM-cycles should also be lower.

Insertions:

Simple insertion:

  1. Non-deterministically (ND) choose the appropriate depth d where the insertion will take place. This in turns implies choosing an insertion procedure among insert16, insert32, ....
  2. Check that the choice made in 1 is correct by:
    1. Getting the internal node, on the path to where insertion is happening, at depth d - 16 (in the case of insert16 this is redundant) and checking that it is not the hash of an empty sub-tree at depth d - 16.
    2. mtree_seting the node with index the prefix of the key with length corresponding to the chosen procedure. For example, if we are inserting at level 32 then we should extract the prefix of length 32 of the key and use that to insert at depth 32 and index the computed prefix.
    3. Comparing the old node returned by mtree_set in the previous point with its expected value i.e. hash of empty sub-tree.

Complex insertion:

  1. ND load the depth and index of the leaf node [K', V'] with conflicting prefix.
  2. Check K' is valid and that it hashes, together with V', to H' which we get from an mtree_set with the appropriate empty sub-tree root.
  3. ND load length of common prefix then use it to extract common prefix and new indices to insert pre-existing node and the new node.
  4. Insert the pre-existing leaf node into its correct new position.
  5. Insert the new leaf node into its correct position.

Implementation:

Simple insertion:

This should be straightforward and was partially addressed previously in this issue.

Complex insertion:

# Input: [R, K, V, ...]
# Output: [E, R_new, ...] where E is the root of an empty sub-tree
proc.complex_insertion

    # Load and store the depth and index of leaf node with conflicting prefix
    # [0, 0, d, i, R, K, V]
    push.0.0.0.0
    adv_loadw
    mem_storew.1

    # Load the root of empty sub-tree at depth d
    # [E, 0, 0, d, i, R, K, V, ...]
    dup.2
    exec.get_empty_hash

    # Replace conflicting leaf node with E
    # [H', R', K, V, ...]
    movdnw.2        # [0, 0, d, i, R, E, K, V, ...]
    drop drop       # [d, i, R, E, K, V, ...]
    mtree_set 

    # Check that H is indeed a leaf node

    ## Load the claimed key K', and store it, and value V'
    ## [V', K', H', R', K, V, ...]
    push.0.0.0.0
    adv_loadw
    mem_storew.2   
    push.0.0.0.0
    adv_loadw
    mem_storew.3

    ## Hash and compare
    ## [R', K, V, ...]
    rphash
    repeat.4
        movup.4
        assert_eq
    end

    # Non-deterministically insert [K', V'] and [K, V] depending on the length of common prefix
    adv_push.1
    if.true             # Length of common prefix is greater than 32
        adv_push.1
        if.true         # Length of common prefix is greater than 48
            adv_push.1
            if.true     # Length of common prefix is greater than 64
                # Here we use an ordered list to store all [K, V] pairs that
                # share a common prefix of length 64
                exec.TODO

            else    # Length of common prefix is in [48, 64)

                # Compare prefixes of k3 and k'3
                # ...

        else        # Length of common prefix is in [32, 48)

            # Compare prefixes of k3 and k'3
            # ...

    else            # Length of common prefix is in [16, 32) 

        ## Extract length 32 bit prefix of k3
        dup.4
        u32split
        swap

        ## Extract length 32 bit prefix of k'3
        push.0.0.0
        mem_loadw.2
        movdn.3
        drop drop drop
        u32split
        swap drop
        push.0
        mem_storew.8
        drop drop

        ## Xor the two length 32 prefixes
        u32checked_xor

        ## Check that first 16 bits are zero while the last 16 bits are not
        ## [R', K, V, ...]
        push.65536
        u32checked_divmod       #[xor % 2^16, xor / 2^16, ...]
        push.0
        neq
        assert
        assertz

        # Insert [K, V] at correct index at depth 32

        ## Compute hash h([K, V])
        ## [R', H, ...]
        swapw.2
        rphash
        swapw

        ## Load the correct index to insert [K, V]
        ## [i, R', H, ...]
        push.0.0.0.0
        mem_loadw.8
        drop drop drop

        ## Insert [K, V]
        ## [E, R'', ...] where E should be hash of an empty sub-tree
        push.32
        adv.set_smt_depth
        mtree_set

        # Insert [K', V'] at correct position

        ## Load K' and V' from memory 
        ## [V', K', R'', ...]
        mem_loadw.2
        push.0.0.0.0
        mem_loadw.3

        ## Compute h([K', V']) and swap (NB: This has been computed earlier but I am recomputing in case
        ## we want to hash the leaf nodes differently depending on the depth of where they are inserted.)
        ## [R'', H', ..]
        rphash
        swapw

        ## Fetch the index of insertion
        ## [i', R'', H', ..]
        push.0.0.0.0
        mem_loadw.8
        drop drop swap drop

        ## Insert H' at correct position
        ## [E, R''', ..]  
        push.32
        mtree_set
    end
end

Main insertion procedure:

proc.insert
    adv_push.1
    if.true
        exec.simple_insertion
    else
        exec.complex_insertion
    end
end

proc.simple_insertion
    adv_push.1
    if.true
        adv_push.1
        if.true
            adv_push.1
            if.true
                adv_push.1
                if.true
                    exec.insertbeyond64
                else
                    exec.insert64
            else
                exec.insert48
        else
            exec.insert32
    else
        exec.insert16
end

If the nesting of if statements is undesirable, we can substitute it, at the cost of extra key manipulation operations, with the following:

  1. For simple insertions, we can ND provide the depth where we are inserting. Using this we can compute the index that we use to query the tree.
  2. For complex insertions, we can provide the length of the common prefix and based on this we can compute the depth and the insertion indices.

Updates:

Updates are simpler due to the fact that we can ND get the depth where we should insert i.e. 16, 32, 48 or 64 and then just do an mtree_set with [K, V_new]. To conclude, we check that [K, V_old] hashes to the node value we get from the mtree_set. V_old is the old value that got updated with V_new and is provided ND.

Deletions

Deletions are more involved since they can led to some of the tiers collapsing upwards. I will be able to provide more details on this in a separate post.

Al-Kindi-0 commented 1 year ago

As for deletion, in principle it should be similar to insertion in that there should be two types of deletions:

  1. A simple deletion which should be equivalent to replacing the key-value pair with the hash of an empty sub-tree.
  2. A complex deletion which happens when the node we want to delete has one and only one sibling, at the same depth, with which it shares a common prefix.

The first type of deletion should be straightforward but the second is not so obvious. More specifically, and if we want the root of the tree to be dependent solely on the content of the key-value map, then during the complex deletion we would need to check at each of the levels 64, 48 and 32 whether there are siblings at these levels that share a common prefix with the corresponding length. The non-local nature of these checks makes deletions a very expensive operation both inside and, I believe also, outside the VM.

The following figures illustrate the above discussions with a simple example:

Simple deletion

simple-deletion

Complex deletion (partial)

complex-deletion-incomplete

Complex deletion (complete): Case 1

complex-deletion-complete-one-levelup

Complex deletion (complete): Case 2

complex-deletion-complete-two-levelup

bobbinth commented 1 year ago

@Al-Kindi-0 - thank you! this is awesome! A couple of comments:

  1. Regarding deletions: these would be applicable for account storage and vaults - so, hopefully, most of the time we'll stay in the top tier (2^16). Deletions from there should be pretty efficient, I think.
  2. We probably need to define a set of decorators which will push a sequence of relevant values onto the advice type so that it can be read-in by executing code as needed.
Al-Kindi-0 commented 1 year ago
1. Regarding deletions: these would be applicable for account storage and vaults - so, hopefully, most of the time we'll stay in the top tier (2^16). Deletions from there should be pretty efficient, I think.

Yes, deletions at the top tier should be just an mtree_set followed by checking that the key corresponds to the node value we got back from mtree_set.

2. We probably need to define a set of decorators which will push a sequence of relevant values onto the advice type so that it can be read-in by executing code as needed.

Yes indeed, sometimes we need just the value (as in deletion in the point above) and sometimes we need the key and value to check that a leaf value un-hashes to them. I will be able to describe in more details the decorators we need once the implementation is stabilized.

hackaugusto commented 1 year ago

Question: I think for full-node / archive-node the trees shouldn't be updated in-place, but keep some sort of log of changes and snapshots. What about expanding this construction to a persistent data structure?

My initial impression this is not super important for the assembly implementation, but useful for the Rust one. And I feel it should be easy to do since we have the node's hashes as unique keys. But I'm wondering what are your thoughts on it.

Edit: It would be helpful if have a place to stated the operations/properties we want from such a tree, together with assumptions and performance analysis.

bobbinth commented 1 year ago

Question: I think for full-node / archive-node the trees shouldn't be updated in-place, but keep some sort of log of changes and snapshots. What about expanding this construction to a persistent data structure?

I think this is still an open question and it is probably depends on a specific use case. For example, when generating block proofs (or applying block-level changes), updates to account/nullifier databases could probably done in-place (and we can also log them).

When executing transactions, we probably want to record a log of what would change, but not actually change the underlying data (as results of the transaction should be applied only once a block with this transaction is built).

hackaugusto commented 1 year ago

I think this is still an open question and it is probably depends on a specific use case. For example, when generating block proofs (or applying block-level changes), updates to account/nullifier databases could probably done in-place (and we can also log them).

This may be a bit premature of me, but let me extend on this. I'm thinking about use cases that require interacting with the existing chain / previous blocks. I believe these use cases are important enough that restoring old blocks should be reasonably efficient. First, two example use cases:

  1. Analytics: Figuring out how many notes have been produced/consume, what assets have been transferred, etc. is useful for ourselves and users to know the health of the chain (ofcourse, modulo private transactions).
  2. Debugging: When a network transaction fails, we should be able to restore all the state required to re-execute the transaction locally, and also to provide inspection similar to etherscan ([random example])

There are other uses, but the point is that reconstruction old blocks should be fast. And reconstructing the Merkle trees is part of that process. All of that to say that logs are a perfectly valid way of doing this, depending on how they are implemented. IMO these logs should be the diff of two consecutive trees, so that it is easy to move forward and backwards, and the tree shouldn't be updated in place, so that a few dozens or even hundreds of blocks can be kept in memory as a cache.

Edit: If this is out-of-scope for now we can also move the discussion to another issue.

Al-Kindi-0 commented 1 year ago

Here is a proposal for doing deletions in Miden assembly using non-determinism (ND). We can use ND in order to make deletions practically the same as insertions inside the VM. The idea would be to think of the tree after the deletion of[K, V] as the tree just before inserting [K, V]. In other words, the prover will non-deterministically provide the root of the tree with [K, V] deleted and then the deletion procedure will call the insertion procedure on this root and key-value pair [K, V]. This will result in a new root which is to be compared with the root that was given at the outset to the deletion procedure. One big positive of this approach is that the root of the tiered-smt will always be a function of only the set of key-value pairs that the key-value map stores. On the negative side of things, implementing deletions this way is a bit more difficult than substituting a leaf node with a root of an empty sub-tree at the appropriate level. I will say more on this towards the end.

Again we will have a simple deletion (which will call a simple insertion procedure) and a complex deletion (which will call a complex insertion procedure).

# R is the root of the tiered-smt containing [K, V]
# R_d is the root of the tiered-smt with [K, V] removed.
# Input: [K, R, ...]
# Output: [V, R_d, ...]
proc.simple_deletion

    # Get the value claimed to be associated with K and save it to memory
    # [K, V, R, ...]
    push.0.0.0.0
    adv_loadw
    mem_storew.0
    swapw

    # Get the root resulting from deleting [K, V] i.e R_d and save it to memory
    # [R_d, K, V, R, ...]
    push.0.0.0.0
    adv_loadw
    mem_storew.1

    # Insert [K, V] into the tree with root R_d
    # [E, R_new, R, ... ]
    exec.simple_insertion

    # Check that R_new is equal to R
    # [E, ...]
    swapw.2
    repeat.4
        movup.4
        assert_eq
    end

    # Put V and R_d on the stack
    # [V, R_d, ...]
    mem_loadw.1
    push.0.0.0.0
    mem_loadw.0
end

proc.complex_deletion should be the same as the procedure above with exec.simple_insertion substituted with exec.complex_insertion. It is obvious that the heavy-lifting is carried out outside of the VM by the advice provider. The logic for implementing this should be straightforward due to the fact that we can decide, on the way up to the root of some sub-tree, whether a sub-tree contains only one leaf or not by comparing the left and/or right hashes with hashes of empty sub-trees at the appropriate levels. I will be able to provide more details, hopefully soon, in #45 .

bobbinth commented 1 year ago

I like this approach! A couple of comments:

  1. As you said, the advice provider is the one which will need to handle the complexity here. I'm not yet sure how to structure it there without doing extra work - i.e., ideally, the advice provider would perform just the deletion and not deletion + insertion.
  2. We can probably combine this with a simple case when the leaf to be deleted lives on the first tier. So, basically, if the leaf is on the first tier (which I expect to be the case for most deletions we'll be doing), then we can use a straight-forward approach of just replacing it with a root of empty sub-tree. If it is on lower tiers, we can use the approach you described above.
bobbinth commented 1 year ago

For the interfaces we'll expose via the standard library, I'm thinking of the following:

We could put the SMT module under the root namespace (i.e., std::smt) or I wonder if we should have a higher-level namespace. For example, something like std::collections::smt. Or maybe we just put it under std::crypto::smt. I'm open to suggestions here.

For the interface, I think it would be ideal to stick to something similar to what we have with mtree_get and mtree_set. Specifically:

Lookups

For lookups we could have smt::get procedure which would result in the following stack transition:

[K, R, ...] -> [V, R, ...]

Where R is the root of the SMT, K is the key, and V is the value located at that key. If the SMT does not contain any value at K, then V would be set to zeros. Thus, checking if some key is present in the tree could be performed as smt::get followed by comparing the returned value to zeros.

Updates

For updates we could have smt::set procedure which would result in the following stack transition:

[ V', K, R, ...] -> [V, R', ...]

Where V' is the new value to be inserted at key K, and R' is the new value of the SMT root post-update. A couple notes about the semantics:

  1. V is the value stored at K prior to the update. If there was no value stored at K prior to the update, V should be set to zeros.
  2. If V' is set to zeros, the operation should result in a deletion. This is actually a question for discussion: it does simplify the interface, but also means that there will need to be an extra conditional check for every insertion operation.

One potential way to address the second point above is to have another procedure - e.g., smt::insert. This procedure would have the exact semantics as smt::set, but would fail if V' is zeros (this check can be done relatively cheaply in about 9 VM cycles).

Updates in copy

In some situations, we want to update a copy of an SMT rather than the original (i.e., when we build a transaction vault by first making a copy of an account vault, and then adding note assets into it). The structure of the advice provider does not allow to have two identical Merkle trees in it (this is because each tree is uniquely defined by its root, and two identical trees would have the same root).

One way to support this functionality is to have another procedure which would be somewhat similar to mtree_cwm. This procedure could be smt::cset and it could work as follows:

[ V', K, R, S] -> [V, R', S, ...]

Where S is the root of the source tree. The update logic would work as follows:

Internally, we would use mtree_cwm operation. Though, we might have modify its semantics.

Another alternative would be to update the advice provider, though, this would probably be a much bigger change.

Summary

Putting all of the above together, the smt module may need to include the following procedures:

Al-Kindi-0 commented 1 year ago

I like this approach! A couple of comments:

1. As you said, the advice provider is the one which will need to handle the complexity here. I'm not yet sure how to structure it there without doing extra work - i.e., ideally, the advice provider would perform just the deletion and not deletion + insertion.

Indeed, the final total computational cost should be "just" that of a deletion.

2. We can probably combine this with a simple case when the leaf to be deleted lives on the first tier. So, basically, if the leaf is on the first tier (which I expect to be the case for most deletions we'll be doing), then we can use a straight-forward approach of just replacing it with a root of empty sub-tree. If it is on lower tiers, we can use the approach you described above.

Indeed, deletions at depth 16 should be as simple as setting the leaf node to the value of an empty sub-tree rooted at depth 16.

Al-Kindi-0 commented 1 year ago

One potential way to address the second point above is to have another procedure - e.g., smt::insert. This procedure would have the exact semantics as smt::set, but would fail if V' is zeros (this check can be done relatively cheaply in about 9 VM cycles).

This makes the most sense to me. Inserting or updating with V equal to the zero word should be handled separately.

Al-Kindi-0 commented 1 year ago

For insertions at depth 64 and beyond, we have two situations again:

  1. Simple insertions.
  2. Complex insertions into a sorted list of key-value pairs.

The first type of insertion is similar to the ones at levels 32 and 48, while the second type is a bit more involved. In the second type, we have at least a pre-existing node in the tree that shares a prefix of length at least 64 bits. The leaf node at level 64 then represents a sequential hash of a list of key-value pairs which all share a common prefix of length at least 64. This list is sorted in increasing order using the keys. Thus to insert a new key-value pair, sharing the same prefix, into the list we need to find the correct position in the list where we can insert the new key-value pair. We then need to compute the sequential hash of the new list which will be the new leaf value at the same index of our tiered-smt.

Here is a proposal for how to do the above:

  1. ND get the number of list elements as a pair (n, m) where n is the number of list elements up to the insertion slot and m is the number of list elements from the spot til the last element of the list. n + m is the number of key-value pairs before the insertion and either n or m can be zero i.e. insertion at the head or tail of the list.
  2. ND load the key-value pairs making up the list into a contiguous memory region using adv_pipe and get the old sequential hash. Using (n, m) we can insert the new key-value pair into its supposedly correct place.
  3. Check that the keys are ordered in increasing order. This will also check that the new key-value pair was inserted in its correct slot.
  4. Compute the sequential hash of the new list of key-value pairs.
  5. Insert the sequential hash into the tiered-smt at depth 64 and index k3
  6. Compare the returned leaf value from mtree_set with the hash value computed using adv_pipe.

Edit: I think that comparing all the keys might be overkill as we can compare just the new key-value pair with its neighbors.

Al-Kindi-0 commented 1 year ago

Here is a proposal for an implementation of insertion beyond depth 64 in Miden assembly:

# Insert a key-value pair into a list of key-value pairs sharing the first key limb i.e. `k3` as a prefix.
# The list is sorted by `k2`, then by `k1` and lastly by `k0`.
# Input: [K, V, R, ...]
# Output: [H_old, R', ...] where H_old is the hash of ordered list without [K, V]
proc.insertbeyond64

    # ND load the number of key-value pairs and save it
    # n is the number of steps before the position of insertion
    # m is the number of steps from the position of insertion til the end of the list
    # [n, m, m + n + 1, 0, K, V, R, ...]
    adv_loadw
    dup.1
    dup.1
    add
    add.1
    swap.3
    drop
    loc_storew.0

    # [n, m, m + n + 1, 0, locaddr.2, K, V, R, ...]

    # Prepare loading the key-value pairs into memory
    # [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, locaddr.2, K, V, R, ...]
    locaddr.2
    movdn.4
    push.0.0.0.0
    push.0.0.0.0
    swapw.2

    dup
    neq.0

    while.true
        adv_pipe

        # Decrement n and store the updated n 
        loc_loadw.0
        sub.1
        loc_storew.0
        dup
        neq.0           
    end

    # Stack so far
    #[R2, R1, R0, locaddr.2 + 2 * n, K, V, R, ...]

    # Save locaddr.2 + 2 * n so that we can manipulate words
    # [locaddr.2 + 2 * n, m, m + n + 1, 0, R0, K, V, R, ...]
    dropw                   #[R1, R0, locaddr.2 + 2 * n, K, V, R, ...]
    loc_loadw.0
    movup.8
    swap drop           #[locaddr.2 + 2 * n, m, m + n + 1, 0, R0, K, V, R, ...]

    # Save K in locaddr.2 + 2 * n
    #[locaddr.2 + 2 * n, m, m + n + 1, 0, R0, V, R, ...]
    movupw.2
    dup.4
    mem_storew
    dropw

    # Save V at locaddr.2 + 2 * n + 1
    # [V, locaddr.2 + 2 * n, m, m + n + 1, 0, R0, R, ...]
    movupw.2
    dup.4
    add.1
    mem_storew

    # Prepare loc_storew.0 and the stack for phase 2 of adv_pipe
    # [m, 0,  m + n + 1, 0, V, R0, locaddr.2 + 2 * n + 2, R, ...]
    swapw
    add.2
    movdn.12
    push.0
    swap
    loc_storew.0

    dup
    neq.0

    while.true
        adv_pipe

        # Decrement m and store the updated m 
        loc_loadw.0
        sub.1
        loc_storew.0
        dup
        neq.0           
    end

    # Stack so far
    #[R2, R1, R0, locaddr.2 + 2 * (n + m + 1), R, ...]

    # Save R0 so that we can compare it with return value of mtree_set
    # [R1, R2, locaddr.2 + 2 * (n + m + 1), R, ...]
    swapw.2
    loc_storew.1
    dropw

    ## Make sure the keys are ordered and that all k3-s are equal. 
    ## TODO: We compare only the k2-s but we need to account for the case when two k2-s are equal.

    # Prepare the stack by load the last key in the list
    # [K, R2, locaddr.2 + 2 * (n + m), R, ...]
    movup.8
    sub.2
    dup
    movdn.9
    mem_loadw

    # Compare the keys
    push.1
    while.true

        # Load a new key from the list in order to compare it with already existing one on the stack
        # [K_new, K_old, locaddr.2 + 2 * (n + m - i), R, ...]
        swapw
        movup.8
        sub.2
        dup
        movdn.9
        mem_loadw

        # 1) Make sure k3-s are equal
        dup.4
        dup.1
        assert_eq

        # 2) Make sure [k2]_new is less than [k2]_old. We are using an increasing order.
        # TODO: compare k1, k0 if necessary.
        dup.5
        dup.2
        gt
        assert

        # Prepare for next iteration of the loop
        swapw
        dup.8
        locaddr.2
        neq
    end

    # Stack so far
    # [K_new, K_old, locaddr.2, R, ...]

    ## Compute the new leaf value as the sequential hash of the list of key-value pairs.

    # Prepare the stack for mem_stream
    # [n + m + 1 - 1, x, x, x, K_new, 0, 0, 0, 0,locaddr.2, R, ... ]
    push.0.0.0.0
    swapw.2
    loc_loadw.0
    swap.2
    sub.1
    loc_storew.0
    neq.0

    while.true

        # Perform the hashing
        push.0
        mem_stream

        # Prepare for next iteration
        loc_loadw.0
        sub.1
        loc_storew.0
        neq.0
    end

    # Stack so far
    # [T2, T1, R0_new, locaddr.2, R, ...]

    ## Insert R0_new

    # Get the index of insertion from memory
    dropw
    loc_loadw.2
    movup.8
    drop

    # Put R0_new behind R
    # [K, R, R0_new, ...]
    swapw
    movdnw.2

    # Insert R0_new
    # [H_old, R', ...]
    swap.3
    drop drop drop
    push.64
    mtree_set

    ## Compare H_old with the previously computed hash R0
    [R', ...]
    push.0.0.0.0
    loc_loadw.1

    repeat.4
        movup.4
        assert_eq
    end

    ## Put the stack in the correct final configuration
    [H_old, R', ...]
    push.0.0.0.0
    loc_loadw.1
end
vlopes11 commented 1 year ago

In order to minimize the cycles, I propose we use the advice provider as much as we can, but always verify its result.

There would be a new function std::collections::smt::update, and it will perform the following transition:

$[V', K, R, ...] \mapsto [V, R', ...]$

We can rename the advice injector from smtget to smt, and accept a numeric constant as argument. It would look like:

adv.smt.0 (i.e. smtget) would remain unchanged. The definition of update (i.e. adv.smt.1) is it would return a number to pick one of the variants below + eventual arguments of the variant.

Hence, its implementation should be similar as follows:

export.update
    adv.smt.1 adv_push.1
    dup eq.0
    if.true
        # empty tree case
    else
        dup eq.1
        if.true
            # empty node case
        else
            (...)
        end
    end
end

Convenience functions

I'll declare some convenience functions in order to proceed with the definition of the cases.

Advice provider variants for update

This section defines the expected return from the advice provider, given a state root.

Empty tree

This case is relatively simple. We just check that root is empty, and then we set the node on depth 16.

Advice provider returned values: none

Advice provider mutation:

Assertions:

Execution:

Empty node

We need to make sure that the provided empty node is the minimum possible tier for that leaf.

Advice provider returned values:

Advice provider mutation:

Assertions:

Execution:

Conflicting key (one tier fallback)

This is a case where a key maps to a node that is another leaf, and both keys won't collide on the next tier.

Advice provider returned values:

Advice provider mutation:

Assertions:

Execution:

Future definitions

If we decide to proceed that way, we will need similar definitions for the following variants

Both insert and remove operations would be a straightforward sequence of calls to smt::update.

bobbinth commented 1 year ago

There would be a new function std::collections::smt::update

I'd probably name this std::collections::smt::insert according to the convention I described in https://github.com/0xPolygonMiden/crypto/issues/22#issuecomment-1420211852. (the difference between this and std::collections::smt::set is that set would also need to handle deletions - so, we can do it later).

We can rename the advice injector from smtget to smt, and accept a numeric constant as argument.

I think we can keep smtget as is and just introduce another decorator. Maybe something like smtinsert.

As for the general approach, I'm thinking it makes sense to break it up into 4 cases as we did with smt::get - one for each tier. This way, we'd be able to handle tier 16 and 32 most efficiently - and this is where the vast majority of inserts will happen in practice.

The skeleton of the procedure could look like this:

export.insert
  # verify that the value to be inserted is not [0; 4]
  dupw assertz assertz assertz assertz

  adv.smtinsert adv_push.2
  if.true
    if.true
        exec.insert_16
        # this case should be pretty simple: we just need to make sure that the node we are replacing
        # is a root of an empty subtree.
    else
        exec.insert_32
        # this case is more complicated as there are two sub-cases to handle:
        # 1. the corresponding node at depth 16 is an internal node.
        # 2. the corresponding node at depth 16 is a leaf node, in which case we should also insert
        #    that node into its new place at depth 32.
        # 
        # we also need to make sure that the corresponding node at depth 16 wasn't a root of an
        # empty subtree (i.e., we couldn't have inserted this new leaf at depth 16).
    end
  else
    if.true
        exec.insert_48
        # this case is even more complicated; we need to handle the following sub-cases:
        # 1. the corresponding node at depth 32 is an internal node.
        # 2. the corresponding node at depth 16 is a leaf node and common key prefix is at least 32 bits.
        # 3. the corresponding node at depth 32 is a leaf node.
        #
        # we also need to make sure that the corresponding node at depth 16 and 32 were not a roots
        # of empty subtrees (i.e., we couldn't have inserted this new leaf at depth 16 or 32).
    else
        exec.insert_64
        # this case is the most complex one, but we don't have handle it for now.
    end
  end
end
bobbinth commented 1 year ago

Here is my stab on how insert_16 could look like. I didn't test it - so, probably has some bugs (I changed a bit stack arrangement needed for mtree_set - I think we'll need to update how that operation works).

# Input: [V, K, R, ...];
# Output:[V_old, R', ...]
proc.insert16
  # split the k3 into index and remainder (8 cylces)
  swapw
  exec.split_16
  # => [rem, idx, k2, k1, k0, V, R, ...]

  # move idx out of the way and save the updated key to memory (5 cycles)
  swap
  movedn.8
  loc_storew.0
  # => [rem, k2, k1, k0, V, idx, R, ...]

  # prepare hasher state (5 cyles)
  push.0.16.0.0
  swapw2
  # => [V, rem, k2, k1, k0, 0, 0, 16, 0, idx, R, ...]

  # hash the value + rem_key to get new node value (10 cycles)
  hperm
  dropw swapw dropw
  # => [N, idx, R, ...]

  # arrange the tack and update the Merkle tree
  movup.4
  push.16
  mtree_set
  # => [N_old, R_new, ...]

  # if N_old is an empty subtree go into the `true` branch; otherwise go into `else` branch
  adv.push
  if.ture
    # push root of an empty subtree of depth 16 onto the stack (4 cycles)
    push.<root of empty tree at depth 16>
    # => [E, N_old, R_new, ...]

    # make sure the node is in fact a root of an empty subtree and return zeros (15 cycles)
    assert_eqw
    push.0.0.0.0
  else
    # prepare hasher state (13 cycles)
    push.0.16.0.0
    push.0.0.0.0
    loc_loadw.0
    adv_push.4
    # => [V_old, rem, k2, k1, k0, 0, 0, 16, 0, N_old, R_new, ...]    

    # save the old value to memory (3 cycle)
    loc_storew.0

    # compute the hash (10 cycles)
    hperm
    dropw swapw dropw
    # => [N_old_computed, N_old, R_new, ...]

    # make sure the old node value from mtree_set is the same as computed one (11 cycles)
    assert_eqw
    # => [R_new, ...]

    # load and return old value
    push.0.0.0.0
    loc_loadw.0
    # => [V_old, R_new, ...]
  end
end
bobbinth commented 1 year ago

Closed by #152 and others.