ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.59k stars 985 forks source link

merkle-proofs.md presentation #2179

Open hwwhww opened 3 years ago

hwwhww commented 3 years ago

Issue

History and status-quo

  1. [Before] Pyspec used to implement the SSZ class library in a simple .py file.
  2. [Currently] Pyspec uses Proto's SSZ library remerkleable for having the much more efficient hash_tree_root.
    • Also, use some class methods and syntax sugar of remekleable in the test suites.
  3. We tried to avoid using Python dark magic in the specs, so merkle-proofs.md implements the algorithm with simple tools and may be less efficient.
  4. 1901 proposes that making SSZ related specs to a standalone repo.

Follow-up of https://github.com/ethereum/eth2.0-specs/pull/2147#discussion_r554659746 comment:

@vbuterin: Do we want generalized index logic to be "human readable"? If so, calling into a library that does weird class magic may not be the best choice and perhaps we could instead try to make it executable by pointing to https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/merkle-proofs.md#ssz-object-to-index ?

Discussion

Since the light client patch (HF1) requires the generalized index scheme, we may want to include the Merkle proof algorithm as part of the specs now. Thus, I agreed that it makes sense to revisit it, make it executable (it should be able to be applied with remerkleable classes smoothly), and add tests to sync with client teams more effectively in the next few months.

There may have some trade-offs, e.g., we may make duplicate specs and delay #1901.

/cc @protolambda, you may have the most comprehensive view of this topic. 🙏

protolambda commented 3 years ago

Currently, remerkleable paths are already very "human readable" (imo): e.g. vote_path = Path(BeaconState) / 'eth1_data_votes' / 42 / 'block_hash' is as readable as it gets. And it's not magic, it's inspired by https://docs.python.org/3/library/pathlib.html, and is as close to a URI path as we can get, while also keeping it typed.

I'm convinced it's good to defer conversion of paths to generalized indices till the last moment possible. It's a lot easier to mess up combinations with integers than combining type-checked path components.

A function like def get_generalized_index(typ: SSZType, path: Sequence[Union[int, SSZVariableName]]) -> GeneralizedIndex: could instead be: def get_generalized_index(path: Path) -> GeneralizedIndex:, and simply iterate through segments as (typ, id) tuples, doing otherwise the same logic as it does now.


Then for merkle proofs, we have two options:

  1. Work with Bytes32
  2. Work with Node, which has a root, and a left and right node.

Currently we've been doing the 1st, which is minimal and works well with full trees (since you can do that i <- i*2, i*2+1 reduction thing in a flat array. And it's simple with single proofs too.

However, with multi-proofs it's not nearly as attractive, since then you get into Dict[GeneralizedIndex, Bytes32] territory: storing paths for every single node in the proof. If we are dealing with multi-proofs, I prefer the structure of having left and right nodes.

One other thing to consider here are client implementations: will they build hashmaps with big-ints as keys to build a multi-merkle proof? In some languages this seems painful to even implement (since big ints are not valid primitives for keys), and probably is quite slow compared to the existing "flat array" approach for full proofs.

And in other contexts that also want to optimize for proofs, e.g. a smart contract, they also will not use call-data with hashmaps in them. Proofs have structure, it doesn't make sense to wrap a binary tree into a hashtree.

Now I'm not trying to optimize as much here, but I do think we should present it in a way that translates well to something that doesn't have to be completely re-approached for efficiency reasons. We'll end up with clients doing something completely different than the spec, making them difficult to audit. This happened before, e.g. with HF1 we are only now getting around to enshrining something alike to the participation tracking optimization, and that feature actually seems really good.

So I propose we keep basic abstractions around (a "path", a "path iterator", a "node"), letting clients can optimize it all they like in the implementations of these, but design towards them using a very similar abstraction.


Let's talk to end-users who will have to produce and consume these proofs. Then I'll change remerkleable to fit whatever direction works best.

mpetrunic commented 3 years ago

A function like def get_generalized_index(typ: SSZType, path: Sequence[Union[int, SSZVariableName]]) -> GeneralizedIndex: could instead be: def get_generalized_index(path: Path) -> GeneralizedIndex:, and simply iterate through segments as (typ, id) tuples, doing otherwise the same logic as it does now.

In js using path would basically mean using and parsing string so in my initial implementation it's a lot easier and efficient to use api like get_generalized_index(typ: SSZType, path: Sequence[Union[int, SSZVariableName]]).

protolambda commented 3 years ago

@mpetrunic Sure, there are lots of ways to represent Path:

I think that if you convert the path to generalized index too early, you get bad (IMHO) things like Dict[generalized_index, Value]. I prefer making Path an interface, and allowing clients to do their own thing over the same spec abstraction. Having more context than an integer is nice to avoid the Dict etc. later on, and "free" if you defer converting a path to a generalized index.

samlaf commented 1 year ago

Is merkle-proofs.md supposed to be runnable code? I've been trying to play with it to learn about generalized indices, hash_tree_root, and proofs, but most of the functions in there don't seem to be part of the eth2spec library (only a few functions in eth2spec.utils.merkle_minimal and some specialized function like compute_merkle_proof_for_state are accessible).

And trying to run the code as standalone completely fails. For example, what is SSZType? Doesn't appear anywhere else in eth2spec. I tried replacing it with remerkleable's View, and BasicValue -> BasicView, but then I stumbled upon BaseList which was also nowhere to be found. SSZVariableName was actually surprisingly defined, which makes me think that the others were just forgotten?

I basically need to prove a validator's withdrawal_credentials against the beacon state root. What would you recommend for this? Could be in any language. I even browsed through the prysm codebase, zrnt, etc. and couldn't find anything "out of the box".

hwwhww commented 1 year ago

Hi @samlaf,

Is merkle-proofs.md supposed to be runnable code?

No, it's not. We use remerkleable's APIs in the executable pyspec.

I basically need to prove a validator's withdrawal_credentials against the beacon state root. What would you recommend for this?

You can use remerakble's API directly, or, use pyspec (eth2spec) light client specs's compute_merkle_proof_for_state(state, index) to generate the proof and use is_valid_merkle_branch(leaf: Bytes32, branch: Sequence[Bytes32], depth: uint64, index: uint64, root: Root) to verify it.

protolambda commented 1 year ago

Stuck at a starbucks for 4 hours (don't ask) so thought I'd help here. Here's how you use the eth2spec and build a proof for any part of a beacon-state & verify the proof using remerkleable :)

python3 -m venv venv
. venv/bin/activate

pip install "git+https://github.com/ethereum/consensus-specs@dev"
import eth2spec.phase0.mainnet as spec   # change this to eth2spec.capella.mainnet if you are loading a Capella state!
from remerkleable.core import Path
from remerkleable.tree import merkle_hash
import os

# We're going to load the genesis state (phase0.BeaconState type)
file = "state.ssz"
state_size = os.stat(file).st_size
print(f"loading BeaconState of {state_size} bytes!")

with open("state.ssz", "rb") as f:
    state = spec.BeaconState.deserialize(f, state_size)

print("loaded state!")

print("validator:")
print(state.validators[123])

# We can navigate to any part of the state, and create a generalized index for it,
# to do merkle navigation with on the raw binary tree that backs the state.
merkle_path = Path(spec.BeaconState) / 'validators' / 123 / 'withdrawal_credentials'
target_gindex = merkle_path.gindex()
print("target gindex (in binary):", bin(target_gindex))

merkle_tree = state.get_backing()

print("root:", merkle_tree.merkle_root().hex())

node = merkle_tree
# 1 less because bitlength returns 1 more than we need to shift, and 1 less because we don't care about the root
check_bit = 1 << (target_gindex.bit_length()-2)
print("check bit:", bin(check_bit))
witness = []
while check_bit > 0:
    if check_bit & target_gindex != 0:  # follow bit path of target gindex, and get sibling nodes as witness
        witness.append(node.get_left().merkle_root())
        node = node.get_right()
    else:
        witness.append(node.get_right().merkle_root())
        node = node.get_left()
    check_bit >>= 1

print("value:", node.merkle_root().hex())
print("witness:")
for i, sib in enumerate(reversed(witness)):
    print(f"{i:3}: {sib.hex()}")

# now let's see if we can verify the proof
x = node.merkle_root()
for i, sib in enumerate(reversed(witness)):
    if (1 << i) & target_gindex != 0:
        x = merkle_hash(sib, x)
    else:
        x = merkle_hash(x, sib)
print("reconstructed state root from proof and value, to verify against real state:", x.hex())

Output (when loading the mainnet genesis state as state.ssz):

loading BeaconState of 5404504 bytes!
loaded state!
validator:
Validator(Container)
    pubkey: BLSPubkey = 0xa9df2cfd79a8b569e7abc286047ade81dbc2e5b89bfd8c00b0913ba3c539b80ff469e77465c6d1815b29e151ab8efd38
    withdrawal_credentials: SpecialByteVectorView = 0x0051416641fd04d2322efa2c23351158f752255ec6354ece0978b2246283d5cb
    effective_balance: Gwei = 32000000000
    slashed: boolean = 0
    activation_eligibility_epoch: Epoch = 0
    activation_epoch: Epoch = 0
    exit_epoch: Epoch = 18446744073709551615
    withdrawable_epoch: Epoch = 18446744073709551615
target gindex (in binary): 0b10101100000000000000000000000000000000001111011001
root: 7e76880eb67bbdc86250aa578958e9d0675e64e714337855204fb5abaaf82c2b
check bit: 0b1000000000000000000000000000000000000000000000000
value: 0051416641fd04d2322efa2c23351158f752255ec6354ece0978b2246283d5cb
witness:
  0: cf69f7df20491fd5ed90381550b0db1209b2b0ba13cdf8e0adf2ff4d2e344089
  1: 19327cb9763c96e00332bde93bdbb1032c4b796dda73e515c8c5f7ede9a419be
  2: bcd42b1f092780448fb0131cd25a24c9d25e4b3b610774ae9aa8d3e437e811fe
  3: f5565f5517b58155cd05d08bed3209f8dba7106059355a770e97f14f46584333
  4: c0c0d17f54cabc30af8ea3259434b21e1186526b4f6eddad24f070299684e07a
  5: 86212bd527bc083f485312a391cb05f030e8dcec1644de496e0d77a57f2dc623
  6: ff272b8aa036383fea68becb7f1272f015de8b6db50326ec8f553db4f3051192
  7: 202d3dfe42be124dd44901342aca0cd4f4c8f82e96e4d208c01f5acf75768faf
  8: 056a473b9ca341b58d69c978ab6928029e2be6dacb04a02fa7780b66dcda2539
  9: d49d356c85a1265a5241e46d9cae7e34293806257107088edc460bcf1691352d
 10: 3cb0c5972e7b766d5eb77b9821d67d98b2a5caf3b7fe7b948db7abc53bcbaa1d
 11: 559f5b3ce5b3476e70ed9ac21a1852a7e6530e798e864b627c3c2c2e3a04ff26
 12: 6bb474e1f0a0596da8e167b2f338327f192b173b291113dd5899909540a4ad83
 13: cb4ff5c9878ec98e8a20deee7bab5d7560a6b6ee912098a76769856a8c8610a8
 14: 9dee21a095961822e6bef7398b23f42eec8fb809cfafeb8f299ccd24ac6f1163
 15: 666e12657b0982c6765e82810fe1ad99f958ce00931e79f6d1a9ff4f9b2fcc2b
 16: c05b3d9fe4e0eb1048e553981d198478cf736d735fa861e566cd60052bc9d48a
 17: fe5a17d87a7a13729f8ed3c89288dd413d4c5f4c10871d32c8a1055ad9942374
 18: d49a7502ffcfb0340b1d7885688500ca308161a7f96b62df9d083b71fcc8f2bb
 19: 8fe6b1689256c0d385f42f5bbe2027a22c1996e110ba97c171d3e5948de92beb
 20: 8d0d63c39ebade8509e0ae3c9c3876fb5fa112be18f905ecacfecb92057603ab
 21: 95eec8b2e541cad4e91de38385f2e046619f54496c2382cb6cacd5b98c26f5a4
 22: f893e908917775b62bff23294dbbe3a1cd8e6cc1c35b4801887b646a6f81f17f
 23: cddba7b592e3133393c16194fac7431abf2f5485ed711db282183c819e08ebaa
 24: 8a8d7fe3af8caa085a7639a832001457dfb9128a8061142ad0335629ff23ff9c
 25: feb3c337d7a51a6fbf00b9e34c52e1c9195c969bd4e7a0bfd51d5c5bed9c1167
 26: e71f0aa83cc32edfbefa9f4d3e0174ca85182eec9f3a09f6a6c0df6377a510d7
 27: 31206fa80a50bb6abe29085058f16212212a60eec8f049fecb92d8c8e0a84bc0
 28: 21352bfecbeddde993839f614c3dac0a3ee37543f9b412b16199dc158e23b544
 29: 619e312724bb6d7c3153ed9de791d764a366b389af13c58bf8a8d90481a46765
 30: 7cdd2986268250628d0c10e385c58c6191e6fbe05191bcc04f133f2cea72c1c4
 31: 848930bd7ba8cac54661072113fb278869e07bb8587f91392933374d017bcbe1
 32: 8869ff2c22b28cc10510d9853292803328be4fb0e80495e8bb8d271f5b889636
 33: b5fe28e79f1b850f8658246ce9b6a1e7b49fc06db7143e8fe0b4f2b0c5523a5c
 34: 985e929f70af28d0bdd1a90a808f977f597c7c778c489e98d3bd8910d31ac0f7
 35: c6f67e02e6e4e1bdefb994c6098953f34636ba2b6ca20a4721d2b26a886722ff
 36: 1c9a7e5ff1cf48b4ad1582d3f4e4a1004f3b20d8c5a2b71387a4254ad933ebc5
 37: 2f075ae229646b6f6aed19a5e372cf295081401eb893ff599b3f9acc0c0d3e7d
 38: 328921deb59612076801e8cd61592107b5c67c79b846595cc6320c395b46362c
 39: bfb909fdb236ad2411b4e4883810a074b840464689986c3f8a8091827e17c327
 40: 55d8fb3687ba3ba49f342c77f5a1f89bec83d811446e1a467139213d640b6a74
 41: f7210d4f8e7e1039790e7bf4efa207555a10a6db1dd4b95da313aaa88b88fe76
 42: ad21b516cbc645ffe34ab5de1c8aef8cd4e7f8d2b51e8e1456adc7563cda206f
 43: 4752000000000000000000000000000000000000000000000000000000000000
 44: 5152000000000000000000000000000000000000000000000000000000000000
 45: 1966996e4b51621e27791d188fe8fba1c5f1749c3305266552bad95b62c0a54a
 46: 162fe1b56a232dabeb5095d95c562db36f5e0775f0d816f3b1f5aefbb99ee378
 47: ca3b6f66b9395967ace5870cb7c7a3dbea6b2d3678acbba82804154c98cd319c
 48: 83aa709f61935832d58c344c31b321c3fc8d347cc2e5d800fb18a18285654146
reconstructed state root from proof and value, to verify against real state: 7e76880eb67bbdc86250aa578958e9d0675e64e714337855204fb5abaaf82c2b
samlaf commented 1 year ago

A big thank you to both of you @protolambda and @hwwhww !