0xaatif commented 3 months ago

Motivation

93

Previous work

206
198

Background

Terminology

Type 1 🥇 is full compatibility ^1 (there should be some more words here, but I don't know what to write)
Type 2 🥈 is almost-compatibility ^1
A witness is a fact about the state of the ethereum world that a prover wants to create a proof for
MPT and SMT are two datastructures used by Type 1 🥇 and Type 2 🥈 respectively.

Story

zero-bin makes RPC calls to an Ethereum node. It requests special traces for a block, and assembles them into some trace_decoder structures.
zero-bin asks trace_decoder to assemble some evm_arithmetization structures.
zero-bin makes several calls to proof_gen and evm_arithmetization to actually produce a result

Current pieces of the puzzle

Pipeline	Type 1 🥇	Type 2 🥈
Ethereum node	`0xPolygonZero/erigon @ feat/zero`	`0xPolygonHermez/cdk-erigon @ zkevm`
`trace_decoder`	`0xPolygonZero/zk_evm/trace_decoder @ develop`	(unimplemented)
`zero-bin`	`0xPolygonZero/zero-bin @ develop`	??
`evm_arithmetization`	`0xPolygonZero/zk_evm/evm_arithmetization @ develop`	`0xPolygonZero/zk_evm/evm_arithmetization @ feat/type2`*
`proof_gen`	`0xPolygonZero/zk_evm/proof_gen @ develop`	`0xPolygonZero/zk_evm/proof_gen @ feat/type2`*

* marks a non-default branch

How `trace_decoder` works

Work

This issue tracks filling in the trace_decoder for Type 2 🥈.

Plan

309 also adds a frontend for the Type 2 🥈 wire format.
[ ] refactor the Type 1 🥇 backend and add Type 2 🥈 support.

0xaatif commented 3 months ago

The backend does a bunch of tree reshaping, editing account balances etc, before spitting out some evm_arithmetization::GenerationInputs

There are two key challenges to face as we add Type 2 🥈 support

GenerationInputs::tries uses the (MPT/Type 1 🥇) format: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/evm_arithmetization/src/generation/mod.rs#L82-L102 I'm considering that out-of-scope for this document - it'll require evm_arithmetization changes - perhaps the solution in trace_decoder ends up in evm_arithmetization
The data structure used for the tree reshaping. Let's discuss that more below.

The key data structure is this: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L203-L208

All those HashedPartialTries (MPTs) are going to have to be replaced if we want this to work. Here's a sketch of what the Type 2 🥈-compatible backend could look like.

Here are the API's for the two data structures AIUI: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/hermez_cdk_erigon.rs#L45-L88

If you squint, they're very similar.

[!IMPORTANT] the rest of this document hinges on them being compatible - I need to know now if they're not in some way I'm missing

I think there are two viable approaches for refactoring:

Wrap them both, so that the backend doesn't know which its interacting with
```
enum Wrap {
    Mpt(HashedPartialTrie)
    Smt(Smt)
}
```
Find some semantic representation, flushing out a hash using the required format when needed.

I think 2 is going to be more maintainable in the long term. The current code already has a bunch of hidden invariants that make it difficult to read. Sometimes it unwraps, sometimes it's fallible. Do the Nibbles I'm looking at represent an Address? Or a Hash(Address) etc, and the waters will likely only get muddier with 1.

With 2, we can make illegal states unrepresentable, and I hope the make the backend easier to follow. Let's explore how the members of PartialTrieState are used in the current backend. I've prioritised exploring the writes.

`state`

Writes:

We write some AccountRlp RLP a couple of times - this is a great fit. https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L424 https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L597
This function is a bit more challenging, and is called a couple of times - it implies that plonky2 has deeper knowledge of the tries. Presumably in the SemanticTrie -> MPT conversion we can ensure we uphold the invariants required. https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L460-L474

Since the mpt "values" are only ever AccountRlp, I think a semantically equivalent version of this member looks like this: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/hermez_cdk_erigon.rs#L21-L42

`storage`

I think this is the most challenging I think it ultimately is a mapping from address (or hash(address)?) to AccountRlp: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/lib.rs#L382-L391, but the write logic is a bit convoluted.

`txn`

Is never written

`receipt`

Is only written in one location: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L296-L298

Nashtare commented 2 months ago

Apologies for not looking at this sooner. A few comments that I hope will clarify some items

There are two key challenges to face as we add Type 2 🥈 support

GenerationInputs::tries uses the (MPT/Type 1 🥇) format: I'm considering that out-of-scope for this document - it'll require evm_arithmetization changes - perhaps the solution in trace_decoder ends up in evm_arithmetization

I'm not sure I understand how this can be considered out of scope, as the trace_decoder must pass "understandable" inputs to the proving backend (in evm_arithmetization) which in the case of the SMT/ Type 2 🥈 format would require a change in this GenerationInputs::tries type. Note that the feat/type2 branch currently supports SMT/ Type 2 🥈 specific format on the proving backend side. The goal of https://github.com/0xPolygonZero/zk_evm/issues/20 is to remove the need of a distinct feat/type2 branch, and allow upper layers (trace_decoder / zero-bin) to rely on conditional feature flag to target Type 1 / Type 2 backend prover. This conditional feature flagging was discussed internally and considered best for future-proofing and development / maintenance of proving backend with the two distinct statements.

The data structure used for the tree reshaping. Let's discuss that more below.

The key data structure is this: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L203-L208

All those HashedPartialTries (MPTs) are going to have to be replaced if we want this to work.

Minor clarification, but the last statement is not accurate. Transaction and Receipt tries do not change, and are still HashedPartialTrie. The state would be replaced by a state_smt field containing serialized SMT (i.e. Vec<U256>), see this code. When forming the final TrieInputs from this PartialTrieState, note that the storage field is removed.

That being said, I agree with you that "2. Find some semantic representation, flushing out a hash using the required format when needed." seems the way to go.

To clarify on your comment "it implies that plonky2 has deeper knowledge of the tries." on delete_node_and_report_remaining_key_if_branch_collapsed: we need the prover to have access to all accounts / storage slots it may try accessing. Some edge cases, like an SSTORE inducing an OOG error would yield unprovable txns if we were to parse "trivially" the clients' payloads, as these usually trim off these info (operation not executed fully -> no need for account data in witness => missing account from trie on the prover side). Same thing happens when deleting a branch child, which results in a collapse, which impacts native tracers, see https://github.com/0xPolygonZero/zk_evm/issues/237 for more info.

txn is never written

Yes it is, just above the trie_state.receipt update, from the provided TxnMetaState in update_txn_and_receipt_tries().

BGluth commented 2 months ago

Do the Nibbles I'm looking at represent an Address? Or a Hash(Address) etc, and the waters will likely only get muddier with 1.

Yeah, I found having redundant type aliases for U256/H256 just to specify if a type is an unhashed/hashed version of something (eg. Address/HashedAddress) helped a good amount with this.

Find some semantic representation, flushing out a hash using the required format when needed.

Hey could you elaborate a bit more on what this might look like? Are you thinking about using a trait to abstract the common operations away (eg. insert, hash, etc.) between the two trie types? This is what my original attempt at adding smt support was attempting to do, as the delta application logic is nearly identical between mpt & smt tries (with the only difference being the trie operations changed and a few steps for setting up mpt tries became irrelevant).

0xPolygonZero / zk_evm

tracking: SMT in trace decoder #275

Motivation

93

Previous work

206

198

Background

Terminology

Story

Current pieces of the puzzle

How `trace_decoder` works

Work

Plan

309 also adds a frontend for the Type 2 🥈 wire format.

`state`

`storage`

`txn`

`receipt`

0xPolygonZero / zk_evm

tracking: SMT in trace decoder #275

Motivation

93

Previous work

206

198

Background

Terminology

Story

Current pieces of the puzzle

How trace_decoder works

Work

Plan

309 also adds a frontend for the Type 2 🥈 wire format.

state

storage

txn

receipt

How `trace_decoder` works

`state`

`storage`

`txn`

`receipt`