Open 0xaatif opened 3 months ago
The backend does a bunch of tree reshaping, editing account balances etc, before spitting out some evm_arithmetization::GenerationInputs
There are two key challenges to face as we add Type 2 π₯ support
GenerationInputs::tries
uses the (MPT/Type 1 π₯) format: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/evm_arithmetization/src/generation/mod.rs#L82-L102
I'm considering that out-of-scope for this document - it'll require evm_arithmetization
changes - perhaps the solution in trace_decoder
ends up in evm_arithmetization
The key data structure is this: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L203-L208
All those HashedPartialTrie
s (MPTs) are going to have to be replaced if we want this to work.
Here's a sketch of what the Type 2 π₯-compatible backend could look like.
Here are the API's for the two data structures AIUI: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/hermez_cdk_erigon.rs#L45-L88
If you squint, they're very similar.
[!IMPORTANT] the rest of this document hinges on them being compatible - I need to know now if they're not in some way I'm missing
I think there are two viable approaches for refactoring:
enum Wrap {
Mpt(HashedPartialTrie)
Smt(Smt)
}
I think 2
is going to be more maintainable in the long term.
The current code already has a bunch of hidden invariants that make it difficult to read. Sometimes it unwraps, sometimes it's fallible.
Do the Nibbles
I'm looking at represent an Address
? Or a Hash(Address)
etc,
and the waters will likely only get muddier with 1
.
With 2
, we can make illegal states unrepresentable, and I hope the make the backend easier to follow.
Let's explore how the members of PartialTrieState
are used in the current backend.
I've prioritised exploring the writes.
state
Writes:
AccountRlp
RLP a couple of times - this is a great fit. https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L424 https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L597plonky2
has deeper knowledge of the tries. Presumably in the SemanticTrie -> MPT
conversion we can ensure we uphold the invariants required. https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L460-L474Since the mpt "values" are only ever AccountRlp
, I think a semantically equivalent version of this member looks like this:
https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/hermez_cdk_erigon.rs#L21-L42
storage
I think this is the most challenging I think it ultimately is a mapping from address (or hash(address)?) to AccountRlp: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/lib.rs#L382-L391, but the write logic is a bit convoluted.
txn
Is never written
receipt
Is only written in one location: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L296-L298
Apologies for not looking at this sooner. A few comments that I hope will clarify some items
There are two key challenges to face as we add Type 2 π₯ support
GenerationInputs::tries uses the (MPT/Type 1 π₯) format: I'm considering that out-of-scope for this document - it'll require evm_arithmetization changes - perhaps the solution in trace_decoder ends up in evm_arithmetization
I'm not sure I understand how this can be considered out of scope, as the trace_decoder
must pass "understandable" inputs to the proving backend (in evm_arithmetization
) which in the case of the SMT/ Type 2 π₯ format would require a change in this GenerationInputs::tries
type. Note that the feat/type2
branch currently supports SMT/ Type 2 π₯ specific format on the proving backend side. The goal of https://github.com/0xPolygonZero/zk_evm/issues/20 is to remove the need of a distinct feat/type2
branch, and allow upper layers (trace_decoder
/ zero-bin
) to rely on conditional feature flag to target Type 1 / Type 2 backend prover. This conditional feature flagging was discussed internally and considered best for future-proofing and development / maintenance of proving backend with the two distinct statements.
- The data structure used for the tree reshaping. Let's discuss that more below.
The key data structure is this: https://github.com/0xPolygonZero/zk_evm/blob/53154426592b9fe2f2141f2cc884d396c3bab8ff/trace_decoder/src/decoding.rs#L203-L208
All those
HashedPartialTrie
s (MPTs) are going to have to be replaced if we want this to work.
Minor clarification, but the last statement is not accurate. Transaction and Receipt tries do not change, and are still HashedPartialTrie
. The state
would be replaced by a state_smt
field containing serialized SMT (i.e. Vec<U256>
), see this code. When forming the final TrieInputs
from this PartialTrieState
, note that the storage
field is removed.
That being said, I agree with you that "2. Find some semantic representation, flushing out a hash using the required format when needed." seems the way to go.
To clarify on your comment "it implies that plonky2 has deeper knowledge of the tries." on delete_node_and_report_remaining_key_if_branch_collapsed
: we need the prover to have access to all accounts / storage slots it may try accessing. Some edge cases, like an SSTORE
inducing an OOG error would yield unprovable txns if we were to parse "trivially" the clients' payloads, as these usually trim off these info (operation not executed fully -> no need for account data in witness => missing account from trie on the prover side). Same thing happens when deleting a branch child, which results in a collapse, which impacts native tracers, see https://github.com/0xPolygonZero/zk_evm/issues/237 for more info.
txn
is never written
Yes it is, just above the trie_state.receipt
update, from the provided TxnMetaState
in update_txn_and_receipt_tries()
.
Do the Nibbles I'm looking at represent an Address? Or a Hash(Address) etc, and the waters will likely only get muddier with 1.
Yeah, I found having redundant type aliases for U256
/H256
just to specify if a type is an unhashed
/hashed
version of something (eg. Address
/HashedAddress
) helped a good amount with this.
Find some semantic representation, flushing out a hash using the required format when needed.
Hey could you elaborate a bit more on what this might look like? Are you thinking about using a trait to abstract the common operations away (eg. insert, hash, etc.) between the two trie types? This is what my original attempt at adding smt support was attempting to do, as the delta application logic is nearly identical between mpt
& smt
tries (with the only difference being the trie operations changed and a few steps for setting up mpt tries became irrelevant).
Motivation
93
Previous work
206
198
Background
Terminology
MPT
andSMT
are two datastructures used by Type 1 π₯ and Type 2 π₯ respectively.Story
zero-bin
makes RPC calls to an Ethereum node. It requests special traces for a block, and assembles them into sometrace_decoder
structures.zero-bin
askstrace_decoder
to assemble someevm_arithmetization
structures.zero-bin
makes several calls toproof_gen
andevm_arithmetization
to actually produce a resultCurrent pieces of the puzzle
0xPolygonZero/erigon @ feat/zero
0xPolygonHermez/cdk-erigon @ zkevm
trace_decoder
0xPolygonZero/zk_evm/trace_decoder @ develop
zero-bin
0xPolygonZero/zero-bin @ develop
evm_arithmetization
0xPolygonZero/zk_evm/evm_arithmetization @ develop
0xPolygonZero/zk_evm/evm_arithmetization @ feat/type2
*proof_gen
0xPolygonZero/zk_evm/proof_gen @ develop
0xPolygonZero/zk_evm/proof_gen @ feat/type2
** marks a non-default branch
How
trace_decoder
worksmpt_trie
evm_arithmetization
Work
This issue tracks filling in the
trace_decoder
for Type 2 π₯.Plan
309 also adds a frontend for the Type 2 π₯ wire format.