ethereum / go-verkle

A go implementation of Verkle trees
The Unlicense
209 stars 63 forks source link

proposal: encode EoAs in extension node #418

Closed gballet closed 2 weeks ago

gballet commented 9 months ago

For the transition, it takes a lot of time to convert leaves from its MPT format to a verkle tree format. Leaves can be slots, or accounts, and some tricks can be used to reduce the amount of commitments that need to be computed, thus speeding up the transition.

Case of EoAs

There are two types of accounts, EoAs and contracts. In the case of a contract, no change is proposed. But in the case of an EoA, note that:

The first part of this proposal is that for extension-and-suffix trees whose values conform to the following rules:

then the commitment to the extension-and-node-tree is computed as:

C = compute_commitment_root([3, # Special extension marker
                             int.from_bytes(stem, "little"),
                             0, # C1
                             0, # C2
                             int.from_bytes(balance, "little"),
                             int.from_bytes(nonce, "little")] +
                             [0] * 250)

This will reduce the number of EC evaluations by 8 since the C1 level that contains 5 elements split into two, is no longer evaluated, and two single fields elements are added to the extension-level polynomial.

If a new key is inserted that breaks this rule, then the default computation rules apply.

Case of groups with a single slot

Because solidity contracts make a extensive use of hash tables, groups with a single slot are also quite common. As such, if a group of values conforms to the following rules:

then the commitment to the extension-and-suffix node is computed as:

idx = [index for index, val in values if val != None][0]
C = compute_commitment_root([4, # Special extension marker
                             int.from_bytes(stem, "little"),
                             0, # C1
                             0, # C2
                             0, # Balance
                             0, # Nonce
                             int.from_bytes(value[idx], "little")]+
                             [0] * 250)

This would replace 3 evaluations with a single one.

If a new key is inserted at a second location, then the default computation rules apply.

dankrad commented 9 months ago

This will reduce the number of EC evaluations by 8 since the C1 level that contains 5 elements split into two, is no longer evaluated, and two single fields elements are added to the extension-level polynomial.

If a new key is inserted that breaks this rule, then the default computation rules apply.

This would require a key collision, right? Given this, the behaviour you specify might be overly complex, we could instead just crash.

For the cases where a single slot is occupied by a hash table entry, I understand the reasoning but one thing to check is the update rule in the proof might become quite complicated?

gballet commented 9 months ago

true, for a single-slot leaf to see another leaf inserted right next to it, it would require a 31-byte collision which is highly unlikely.

As for the complexity of the update proof, from what I envision it would simply be values 6 & 7 becoming 0, and then C1/C2 becoming non 0. As Ignacio said yesterday, we could just introduce a new extension status type but I don't even believe this would be necessary since it's pretty much like adding a new value to e.g. C2 if only C1 was non-zero.

dankrad commented 9 months ago

true, for a single-slot leaf to see another leaf inserted right next to it, it would require a 31-byte collision which is highly unlikely.

No it does not, actually:

def get_tree_key_for_storage_slot(address: Address32, storage_key: int):
    if storage_key < (CODE_OFFSET - HEADER_STORAGE_OFFSET):
        pos = HEADER_STORAGE_OFFSET + storage_key
    else:
        pos = MAIN_STORAGE_OFFSET + storage_key
    return get_tree_key(
        address,
        pos // VERKLE_NODE_WIDTH,
        pos % VERKLE_NODE_WIDTH
    ) 

So as long as the address and the first 31 bytes of the storage key are the same, it will go to the same place. (It is unlikely for a pure hashmap but contracts can directly access storage keys).

jsign commented 9 months ago

As Ignacio said yesterday, we could just introduce a new extension status type but I don't even believe this would be necessary since it's pretty much like adding a new value to e.g. C2 if only C1 was non-zero.

I'm not sure I understand, so extending a bit the rationale of what I said.

The current extension status is what allows the verifier rebuilding the tree to know which kind of "format" this leaf node has. If you don't have new extension status versions, then I can't see how the rebuilt leaf node can decide between the [1, stem, C1, C2] [values_half1] [values_half2] format vs [3/4, values] format to "store" the current_values for keys in that stem?

To put it in an example, if I give you some state-diff: 0x0000...00 = 0 0x0000...01 = 10 0x0000...02 = 100 0x0000...03 = <empty keccak val> 0x0000...04 = 0

Only with this information you can't be 100% sure this corresponds to an EOA leaf node, despite it looks like one for nonce=10 and balance=100. But theoretically can also be some storage slots of a contract (you could argue "specially crafted" but it's quite easy to do it).

For the verifier to rebuild the tree correctly, in this proposal, it needs a way to decide which case it is. And as shown, only with the values it's impossible to know thus why I believe we need a new extension status code for each of these new markers. (BTW, it's not a huge deal since it's some new value in an existing bytes, so the witness size is the same)

LMK if I misinterpreted your point since I'm not sure I understood the C1/C2 example you mentioned.

jsign commented 9 months ago

Dumping here other points mentioned in the call:

I've been a bit sloppy/hand-wavy here, just listing for the record.

gballet commented 9 months ago

No it does not, actually:

it does, because:

So as long as the address and the first 31 bytes of the storage key are the same, it will go to the same place. (It is unlikely for a pure hashmap but contracts can directly access storage keys).

And that's the collision I'm talking about: the first 31 bytes of the storage key.

gballet commented 9 months ago

The current extension status is what allows the verifier rebuilding the tree to know which kind of "format" this leaf node has. If you don't have new extension status versions, then I can't see how the rebuilt leaf node can decide between the [1, stem, C1, C2] [values_half1] [values_half2] format vs [3/4, values] format to "store" the current_values for keys in that stem?

To put it in an example, if I give you some state-diff: 0x0000...00 = 0 0x0000...01 = 10 0x0000...02 = 100 0x0000...03 = <empty keccak val> 0x0000...04 = 0

Only with this information you can't be 100% sure this corresponds to an EOA leaf node, despite it looks like one for nonce=10 and balance=100. But theoretically can also be some storage slots of a contract (you could argue "specially crafted" but it's quite easy to do it).

For the verifier to rebuild the tree correctly, in this proposal, it needs a way to decide which case it is. And as shown, only with the values it's impossible to know thus why I believe we need a new extension status code for each of these new markers. (BTW, it's not a huge deal since it's some new value in an existing bytes, so the witness size is the same)

Right, so actually the problem isn't really a problem in practice: if they see that the code hash and code size are empty, clients will automatically fill a size of 0 and a code hash of keccak([]). This is already what the proposal implicitly expects, because it won't be able to read the hash from that extension node type 3. When reading these values from the tree, the client knows the context and whether something is an EoA or a slot number.

So we could have a rule that during the expansion of an EoA from type 3 to type 1, the code keccak is left empty.

Now regarding the potential hack: if someone indeed manages to write values that look like an account to a single storage slot (for type 4), then expand it into a type 1 extension node so that client ends up interpreting it as an EoA, that means they are able to find a 31 byte stem collision. Which means they would be able to do the same thing with a type 1 extension node.

jsign commented 9 months ago

Now regarding the potential hack: if someone indeed manages to write values that look like an account to a single storage slot (for type 4), then expand it into a type 1 extension node so that client ends up interpreting it as an EoA, that means they are able to find a 31 byte stem collision. Which means they would be able to do the same thing with a type 1 extension node.

I think there's a confusion here. I'm not talking about leaf nodes changing types after writes or similar -- I'm just talking about the tree reconstruction form the witness.

To cite my example again. If I give you a witness with the following statediff: 0x0000...00 = 0 0x0000...01 = 10 0x0000...02 = 100 0x0000...03 = [empty keccak val] 0x0000...04 = 0

As a verifier rebuilding the tree, you don't know if the stem of those keys is an EOA or storage slots.

There's no need to "find a collision" for a malicious prover or similar. Only by looking at a stem, it's impossible to know if this leaf node is an EOA or storage slots (i.e: there's no need to find a particular tree key shape or value). That's why I think the new extension status to signal the new markers are needed now.

The tree reconstruction algorithm from the witness, doesn't know anything about EOAs, contracts, or similar. Just knows about tree keys and values.

Am I missing something?

dankrad commented 9 months ago

The tree reconstruction algorithm from the witness, doesn't know anything about EOAs, contracts, or similar. Just knows about tree keys and values.

Thinking about this, we actually want the property that the tree/database doesn't have to know about this distinction. We wanted to construct it in a way that can be changed for any generic 32 byte key-value database.

So this is an argument against doing anything special for EOAs -- any special treatment should be purely based on the values written, rather than the type of account.

jsign commented 9 months ago

Yes, that was always my interpretation about the extension status codes -- mostly the abstraction at the boundary between the "tree client" (i.e: ethereum node) and the tree.

So to avoid talking about EOAs or storage slots at the tree level.

You could use new extension status code but to only signal the template of the leaf node, while still not involving EOAs or storage-slot concepts. But... it's starts to get a bit shady.

gballet commented 2 weeks ago

Closing this issue, there are enough good reasons not to do this.