On-chain incremental Merkle tree append with reduced gas costs

HarryR commented 6 years ago

The incremental merkle tree implementation included in MerkleTree.sol requires a large amount of gas to perform the update, this is because all of the nodes for the merkle tree are stored on-chain.

I have two proposals to reduce the cost of appending an item to the incremental merkle tree in a way which reduces the storage cost significantly by only storing the root and none of the intermediate levels or leaves.

Clients monitoring the contract would have to re-create the merkle tree by monitoring events emitted as every leaf is added to the tree. e.g. upon each append, it emits a LeafAppend(offset, data), then you retrieve every LeafAppend event and re-construct the merkle tree locally.

It would be possible for light-clients to use a verifiably honest third party to store the merkle tree for them, it would need to monitor the contract and be able to provide proofs for any item that can be verified using the on-chain contract.

1) Pre-compute the whole merkle tree, using 'unique elements' as placeholders for everything, allow updates in-sequence by providing the elements proof and the new leaf hash.

2) Require a proof of the last element in the tree

Note on 'unique items'

For any given node in the tree, to be used as an incremental merkle tree it's necessary to have 'synthetic and unique items' which can be computed quickly to create proofs using items which don't exist yet. These balance out the tree.

e.g. if you have a tree with one leaf (L) the second leaf is one of these synthetic items, it's a unique placeholder (X)

   r
 /   \
L     X

When you add another leaf, the synthetic item disappears, in its place the real item is used

   r
 /   \
L     o

These synthetic items allow you to make a merkle tree with a fixed depth (required for the prover circuit), with a fixed depth of 3 this becomes (with one item), which requires two synthetic items to balance out the tree.

     r
   /   \
  n     X
 /  \
L    X

Precomputation

Precomputing the whole tree would need to store (2*N)-1 items, e.g. 2<<31 items would require (2<<32)-1 total items in storage. However, this would only need to be computed once to find the root node, as all other items are predictable.

In this tree, every node and every leaf is synthetic, every node is real (as it's derived from values, albeit synthetic ones), and the root must be real, e.g.

         r
     /       \
    n         n
  /   \     /   \
 X     X   X     X

The root and every node can be created using a deterministic number of items in-memory at any point in time, with all unnecessary items (those which will never be used again) could be discarded as they can be can be re-computed.

The problem comes when verifying the root, which requires recomputation of everything in the path and its dependencies, to avoid the cost of computation you would have to store ((2*N)-1) * HashSize items on disk, with 2<<27 items (268m) and a hash size of 32 bytes this requires 8gb of storage.

This also requires the tree to be updated whenever items are added, otherwise subsequent proofs would be based on incorrect data, however it could be updated incrementally - changing only one item and then recomputing all the nodes in the upward path to the root.

The amount of storage vs the cost of re-computing the tree can be balanced out by caching/storing the top N levels of the tree, and requiring re-computation of N levels of the subtree. e.g. where C is cached the number of levels below is the number of items which need recomputing, e.g. 2^N, in this case N is one as only one level needs recomputing.

e.g. if height is 3, number cached is 2 then number recomputed is 2^(3-2)

         r
     /       \
    n         C
  /   \
 X     X

e.g. if Height is 29, cached is 15, total size of cache (for 32byte items) is 1 MiB and (32768 items) and the number of items requiring recomputation is 16384, with SHA2-256 this number looks like a reasonable trade-off, however with LongsightF the field operations may make this many times more expensive.

The on-chain component knows the number of 'real' leaves, and requires a merkle proof of the next item (which doesn't exist yet) along with the new leaf value to re-compute the root, it doesn't need to store anything other than the root and the sequence (or, count, of real leaves).

Append via Proof of End

This follows the incremental merkle tree model more closely and requires no precomputation, however the on-chain contract must be aware of which nodes are synthetic and which aren't so that it can enforce the synthetic nodes values and re-compute ones from the previous proof which have been changed as a result.

e.g. if there is a tree with 1 item and you want to add another, you provide proof of the first item, from which proof of the 2nd item and therefore the root can be derived.

The advantage of this, versus the pre-computed tree, is that knowing the previous valid merkle path, sequence number and how the synthetic nodes are created then you can add an item to the tree without knowing all of the values. this makes it very efficient for 'light clients'.

See diagrams below for reference.

TODO: find simple algorithm to determine the common pairs from two merkle paths to verify that one was added immediately after the other. They must share at least 1 common node.

Proof of L1 is [0,0] [X,Y]

Proof of L2 is [1,0] [L1,Y]

        r
     /     \
    n1      n2
  /   \    /  \
 L1   L2  L3   Z

Proof of L3 is [0,1], [Z,n1]

        r
     /     \
    n1      n2
  /   \    /  \
 L1   L2  L3  L4

Proof of L4 is [1,1], [L3,n1]

Reducing size of proofs, by being 'synthetic aware'

Any synthetic node included in the path can be omitted if there is a deterministic algorithm to identify which nodes in the path are synthetic using only the sequence (total number of items) of the merkle tree.

e.g. with the following tree, providing a proof of L requires X and Y, where the address bits are [0, 0] and the path is [X, Y]. However, because the leaf X and node Y are synthetic the path can be empty and all that is required are the address bits - then the on-chain contract can re-create the synthetic parts. It would need to verify the synthetic items in the path anyway, otherwise tree proofs could potentially be faked.

Because the synthetic items need to be verified, the only question is whether or not excluding them from the msg.data for the function parameters, versus the cost of keeping track of which items were synthetic vs which are real (e.g. if item 1 is synthetic, and item 2 is real, the path would be [X, Z] where Y is skipped) versus the cost of omitting these items.

Problems with both approaches

The problem with both of the approaches is that it only allows one insert into the merkle tree per block, as while there will be a conflict if two people try to append at the same time.

unfortunately the way to fix this is to store the merkle tree on-chain.

it may be possible to use a side-chain which aggregates updates and submits them to the main-chain in one go, however IMO this adds an unnecessary component which can be interfered with etc.

HarryR commented 6 years ago

https://ethresear.ch/t/double-batched-merkle-log-accumulator/571 includes some interesting ideas for a batched incremental merkle tree

drewstone commented 6 years ago

If we had a keccak256 gadget, then computing intermediate hashes along the merkle tree would be cheaper AFAIK, as long as the source isn't completely outdated.

Source: https://ethereum.stackexchange.com/questions/3184/what-is-the-cheapest-hash-function-available-in-solidity

HarryR commented 6 years ago

If we had a keccak256 gadget, then computing intermediate hashes along the merkle tree would be cheaper AFAIK, as long as the source isn't completely outdated.

The on-chain cost of insert is dominated by the storage costs of updating 29 levels of a tree, each at 20k gas.

And LongsightF is being used for the zkSNARK circuit because 58x SHA256 compression functions in the snark circuit for the path takes ~1.4m gates, whereas LongsightF uses a tiny fraction of that - making it possible to run on Mobile, via WebAssembly etc. Keccak256 is about the same cost as SHA256, for the snark circuit.

But.. doing LongsightF in solidity is a lot more expensive than either Keccak256 or SHA256, and ontop of that there's the storage costs.

So... Need to keep LongsightF for the SNARK, and reduce the number of on-chain storage operations to the absolute bare minimum, hence looking into either algebraic types where only the root and paths are required to append, or with precomputation, or by doing the update inside the SNARK circuit and providing a new root.

HarryR commented 6 years ago

Another option for the on-chain accumulator, where normal merkle proofs can be used, is to slowly build the new part of the tree to be appended while deleting old values - for every value deleted there's a 3/4ths refund.

However, when deleting items from on-chain storage it shifts the problem of storing enough information about the tree nodes to create a proof onto the user. Logs cost 8 gas per byte + 375 gas + (375 * n_topics). So, if for every node deleted a log entry is emitted there will still be a net gas saving compared to not deleting the nodes, and the nodes will still be directly accessible by address by doing a log lookup by address.

Example of how deletion works:

Where events are in the format of:

event TreeItem(uint256 indexed address, uint256 value)

This costs 1264 gas to emit, lets call it 1.5k...

Append L1:

Proof of L1 is [0,0] [X,Y]
Store L1 (20k)
Emit L1 address and value (1.5k)
Update n1 (20k)
Update r (20k)

Cost: 61.5k

Append L2:

Proof of L2 is [1,0] [L1,Y]
Emit L2 address and value (1.5k)
Delete L1 (-15k)
Update n1 (20k)
Update r (20k)
n1 is frozen forever, but needs to be available on-chain?

Cost: 26.5k

Append L3:

        r
     /     \
    n1      n2
  /   \    /  \
 ?     ?  L3   Z

Proof of L3 is [0,1], [Z,n1]
Store L3 (20k)
Emit L3 address and value (1.5k)
Update n2 (20k)
Update r (20k)

Cost: 61.5k

Append L4:

        r
     /     \
    n1      n2
  /   \    /  \
 ?     ?  L3  L4

Proof of L4 is [1,1], [L3,n1]
Emit L4 address and value (5k)
Delete L3 (-15k)
Delete n1 (-15k)
Emit n2 address (5k)
Delete n2 (-15k)
Update r (20k)

Cost: -18.5k

Result after appending 4 items:

        r
     /     \
    ?       ?
  /   \    /  \
 ?     ?  ?    ?

The rule is: whenever a node becomes fully balanced all of its children can be deleted.

The problem is that with a larger number of items the cost isn't O(1), and the height still requires many updates which is the main gas burner, and it's this specifically that we want to minimise as much as possible.

One problem with not storing items on-chain, and which is specifically relevant to the reduced number of LongsightF rounds used in the merkle tree, is the possibility of finding collisions and verifying that the leaf is constructed correctly, while still allowing multiple updates to happen in whichever order the blockchain decides the transactions go through.

One way to avoid updates is to make them low-frequency, e.g. append blocks to a buffer which builds a perfectly balanced sub-tree, then when it's fully update the full path up to the root - storing only the necessary items and deleting unnecessary ones.

e.g. if we say the subtree is going to be 16 items, that's 4 layers of nodes above the leaves, which saves a significant amount of gas when appending. However the intermediate layers still need to be built.

leaves = []
l1 = []
l2 = []
l3 = []
l4 = None

for i in range(0, 16):
    leaves.append(i)
    if len(leaves) % 2 == 0:
        l1.append(H(*leaves[-2:]))
        if len(l1) == 2:
            l2.append(H(*l1[-2:]))
            l1 = []
            if len(l2) == 2:
                l3.append(H(*l2[-2:]))
                l2 = []
                if len(l3) == 2:
                    l4 = H(*l3)
                    l3 = []
                    emit Subtree(l4.address, leaves)
                    leaves = None
                    tree.append(l4) # Update full path and tree root

Realistically only the temporary item at each level needs to be stored, and when that level becomes balanced it can be deleted. However, that still requires 500k gas, in addition to the cost of computing the hashes, to perform the merkle root update, and this decreases by 20k every time the number of items in the subtree doubles (e.g. with 32 items you'll save 20k gas, 64 items 40k etc.)

Interesting papers:

drewstone commented 6 years ago

Why does updating hashes cost 20k gas? Shouldn't it be much cheaper since you can do an insertion and update within the same transaction?

The SHA256 gas cost is given by

gr = 50 + 50(data length / 32)

Since updates to hashes are SHA256(2x 32byte words) => the gas cost is 50 + 100 = 150.

As a potentially dumb idea, it could make sense to use a left-leaning skewed merkle tree. Although this would increase the size of merkle proofs (the biggest downside), every insertion would benefit from yielding a single deletion and potentially amortize the gas costs over users, given each new insertion would have to update one more hash of cost 150 gas than the previous. As a result, the tree is restricted to certain heights since this must be satisfied: 150 * depth + other_costs < gas_limit

        r
     /     \
    n1       L1
  /   \ 
 X     L2
/ \
.  .

If this can all be bundled in the same transaction then an insertion would be: 20k gas for storage 150 * (depth) updates -15k gas for deletion 1.5k gas for event

= 6.5k + 150*depth of the tree.

EDIT: I'm tempted to think that your costs are for the LongsightF construction. Let me know your thoughts nonetheless.

HarryR commented 6 years ago

What you're missing is that each storage update costs 20k gas, and each deletion refunds 15k gas.

However, refunds are capped at 50% of the real gas cost. So if you store 100 things and delete 100 things, you'll only be charged for storing 50 things.

The other problem is that the tree depth is limited by the zkSNARK circuit, which is of a fixed size, and it proving each level of the merkle tree proof costs a couple of seconds (or more), so with 150 levels that could take 5-7 minutes to prove.

HarryR commented 6 years ago

I have implemented a rough draft of an on-chain sparse incremental merkle mountain range...

https://github.com/HarryR/ethsnarks/blob/ca40b9200ac1cbf2b8d8e8d5606f304e72141afa/appendix/immr.py

This tracks the on-chain cost of storage & deletion as per the EVM rules.

Example output, at the end it outputs the total number of items at each level, and the keys being stored in the database.

Append(lvl=0, seq=996, item=997)
gas: peak=40200 refund=0 total=40200

Append(lvl=0, seq=997, item=998)
Append(lvl=1, seq=498, item=17222759902109724554154151244684653439133570983820790767918053288734235216983)
gas: peak=60600 refund=15000 total=45600

Append(lvl=0, seq=998, item=999)
gas: peak=40200 refund=0 total=40200

Append(lvl=0, seq=999, item=1000)
Append(lvl=1, seq=499, item=16151676244743771913220904007540730850315611735271278735706897261663009225222)
Append(lvl=2, seq=249, item=360555614744579230337566340333026929868316868627391280780411657642798941175)
Append(lvl=3, seq=124, item=17091390504405981657579101865512078031112117797633598751647321595954421208885)
gas: peak=101400 refund=45000 total=56400

-----------

level 0000: 1000
level 0001: 500
level 0002: 250
level 0003: 125
level 0004: 62
level 0005: 31
level 0006: 15
level 0007: 7
level 0008: 3
level 0009: 1

level 0003 item 0124: 17091390504405981657579101865512078031112117797633598751647321595954421208885
level 0005 item 0030: 11432221231255018214084460762913861530967799609110822039801369274264043459453
level 0006 item 0014: 21709797073617348950618721483223221701276312367409941640768052740088622778821
level 0007 item 0006: 11426590355662709180378281331416813339771702869015491326709053175928753139730
level 0008 item 0002: 14255486654154165991711447228273623210898737301412569004468677634153235753660
level 0009 item 0000: 13048253346413405701399443551142082279890037746369451596589013900728189841360

As you can see it's very space efficient, it only stores unbalanced nodes and deletes everything else. For 1000 appended items it only has 16 active storage values, however a 'heavy' insert that updates ~7 or so levels at once cost about 91k gas (after a 105k refund, peak at ~183k gas)

It also emits the Append event which allows clients to lookup the value of a node.

Next I need to write some code which calculates the root for the MMR, and code to calculate paths for nodes, accounting for the MMR.

Then look into making the zkSNARK circuit handle arbitrary length paths, e.g. 'path where only first N are used to calculate root', as the paths will be variable length depending on the form of the merkle mountain range and number of items in the tree.

barryWhiteHat commented 6 years ago

Then look into making the zkSNARK circuit handle arbitrary length paths, e.g. 'path where only first N are used to calculate root', as the paths will be variable length depending on the form of the merkle mountain range and number of items in the tree.

I would be careful about doing variable lenght zksnark proofs. beacuse if this is possible an attacker can deposit 1 coin and add hash X to the merkle tree. Where hash X is the hash of hash Y and hash Z. Where hash Y and Z are valid enteries and the attacker can withdraw for Y and Z. They can only do this be setting the depth in the zksnark to x + 1.

HarryR commented 6 years ago

beacuse if this is possible an attacker can deposit 1 coin and add hash X to the merkle tree.

Hmm, ya I see the problem.

You could mitigate against this by, say if the tree is only 5 levels deep, by iteratively hashing the the root with unique values until it reaches the necessary depth.

HarryR commented 6 years ago

I think I may have already addressed this by using a separate hashing algorithm for the leaf of the tree versus the nodes of the tree, this creates separate domains.

e.g. Leaf is hashed with LongsightF322p5, leaves are hashed with a reduced number of rounds.

However, with SHA256 for both the leaves and the tree nodes this kind of attack would be possible, as long as the preimage you prove for the leaf was a plain H(left, right) in the same way as tree nodes.

You could mitigate against this further:

With BLAKE2b, use a different hashing domain for the leaf, and each level of the tree.
With Longsight, use different round constants for the leaf, and each level of the tree.
With SHA256... use a different IV? This creates separate hashing domains
Use a Pedersen commitment (or other elliptic curve operation) to prove the leaf preimage

e.g. say it's something like:

leaf = H(x, y, "leaf")
level1 = H(leaf, path[0], "level1")
level2 = H(level1, path[1], "level2")
level3= H(path[2], level2, "level3")
...

barryWhiteHat commented 6 years ago

Yes having a differnt hash function should work.

drewstone commented 6 years ago

@HarryR looking around, it seems storage updates are only 5000 gas when you change a non-zero value to another non-zero value.

You can update this to maybe the following:

    def __setitem__(self, idx, value):
        if idx in self._stor and value == 0 and self._stor[idx] != 0:
            self._gas_refund += 15000
        elif idx in self._stor and self._store[idx] != 0:
            self._gas += 5000
        else:
            self._gas += 20000

        self._stor[idx] = value

HarryR commented 6 years ago

Ahh, good spot.

I looked at the go-ethereum source code and modified the function a little. It now matches this:

https://github.com/ethereum/go-ethereum/blob/2433349c808fad601419d1f06275bec5b6a93ec8/core/vm/gas_table.go#L118

barryWhiteHat commented 6 years ago

Why does updating hashes cost 20k gas?

The reason for this is that sha256 is added as a precompile while kecca256 is an opcode. So when you call sha256 is costs the same as calling a contract plus the actual operation. There is an EIP to reduce this cost i can get it if you are interested.

HarryR / ethsnarks