Open vbuterin opened 5 years ago
How does this relate to previous issues on SSZ partials? It is getting a bit difficult to follow the various issues around SSZ merklization indexing light client stuff.
If I understand correctly that the ultimate purpose of all this is to specify a format for multiproofs, then all this documentation is quite cumbersome relative to the docs on turbo-geth multiproofs. Perhaps it doesn't help that a trie structure, a multi-proof format, and a serialization encoding are all framed in terms of "SSZ" (imagine describing eth1 merkle-patricia-trie proofs in terms of "RLP merkle partials"), coming together to form a far from simple construct, part of which is ostensibly frozen with the phase0 spec and another part which is, apparently, not.
It's almost equivalent, the only difference here is that the serialization is simpler, it's a list of hashes rather than being a container. It's also backwards compatible so eg. existing Merkle branches can fit into the format already.
I generally think of this scheme as "partials"; the connection to SSZ serialization and tree hashing is definitely not 100% right, though there are some connections, eg. the way that tree hashing is designed to insist on a binary tree structure without doing weird things to special-case in bytes, is in part there to make partials easier.
You're right a lot of those old issues need to be cleaned up; some have been accepted, others deprecated.
Issue Status: 1. Open 2. Started 3. Submitted 4. Done
This issue now has a funding of 400.0 DAI (400.0 USD @ $1.0/DAI) attached to it.
We do not even need to include leaves, because much of the time some or all of the information from leaves will be known.
I'm not quite convinced on this. For example, if we think of EEs as pure state transition functions (for illustrative purposes, I'm breaking up the "data blob" parameter into witness
and transactions
):
fn state_transition(
beacon_state_root: Bytes32,
pre_state_root: Bytes32,
witness: Vec<Bytes32>,
transactions: Vec<Transaction>
) -> Bytes32;
And we imagine that the state root of the EE is the root of this tree:
// FixedVector[u256; 4]
//
// +- 1 -+
// / \
// 2 3
// / \ / \
// 4 5 6 7
We'll assume there is silly transaction that simply increases the sender's balance. The EE may receive a transaction package that looks something like this:
[
{
"amount": 0,
"sender": 2,
// "witness": ["ABC", "DEF", "GHI"]
},
{
"amount": 0,
"sender": 3,
// "witness": ["JKL", "MNO", "PQR"]
},
Where the witnesses would have been combined into the multiproof: ["ABC", "JKL", "MNO", "DEF"]
(e.g. chunks [5, 6, 7, 4]
).
At this point, when the first transaction is executed and needs the path to the account at address 1
, I'm not sure how it is going to locate the correct chunks from the witness as it has no visibility into what leaves other transactions will be accessing.
I can think of two possible solutions off the top of my head:
There is an access list which declares the paths that will touched by the transaction package; in which case the EE will be able to calculate the accessed leaf indexes. Maybe this specifies a type of partial that is a subset of the structure described above?
The leaf node indexes are explicitly included in the partial structure.
💰 A crowdfund contribution worth 0.10000 DAI (0.1 USD @ $1.0/DAI) has been attached to this funded issue from @.💰
Want to chip in also? Add your own contribution here.
@uivlis Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!
Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days
Yeah, I'm still working, trying to figure out so much new jargon.
@uivlis Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!
Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days
As discussed during EthBerlin (with Quilt, Vitalik, GBallet) and shared in the call, I created this repository to describe an idea to use helper-data to speed up tree lookups and verification, you could call the first approach "ssz tree offsets": https://github.com/protolambda/eth-merkle-trees [experimental and untested] There is a wide effort to make multi-proof merkle trees better, also in Eth1 (@AlexeyAkhunov, @karalabe, @gballet).
Storing the bottom node data in a packed form is simplified when you can forget about what exactly is a witness, and what a leaf node. For batching it also makes a lot of sense to me to forget about the difference, as items in the batch may have one witness that is a leaf in another. Merging and verifying is just a lot more complicated (to do efficiently at least) if application differences like this influence lower level encoding.
Also, lookups for reading and in-place edits are important if we like to back a partial with a multi-proof, and have the partial behave as an interface. (less memory changes and copies in ewasm if the proof does not need to be deserialized at all).
Then, an instruction-set like approach (Alexey) is interesting for complicated structures like MPT, but I don't see how well it really speeds up lookups of leaves, and lookups should be cheap (do not construct the whole tree if possible). So I believe an approach with some kind of structural helper data, like the tree-offsets, would be better:
O(depth)
operations for lookup. (and possible to shorten during compile time for static parts)For Merkle Patricia Tree optimizations, representing MPTs as binary trees, and then changing the verification to use a bigger scratchpad-stack to hash hexary nodes, could be a solution to backport these optimizations (or at least run them efficiently in ewasm). And the encoding may also just be a start for a better encoding to back MPTs in Geth, if we add support for MPT extension nodes.
The thing where I still have doubts about is where to encode these offsets: it's simple and small, just doable in a single depth-first walk that encodes 2/32
extra data for most proofs. But at the same time there are places where you can know it during compile time and not have to read it from input.
After Eth2 interop week (= ends about 2 weeks from now) I like to pick this up with whoever is working on this. I am leaving it as-is now to focus on phase0 things.
Issue Status: 1. Open 2. Cancelled
The funding of 400.0 DAI (plus a crowdfund of 0.1 DAI worth 0.1 USD) (400.0 USD @ $1.0/DAI) attached to this issue has been cancelled by the bounty submitter
Problem statement
In an SSZ partial, we have:
leaves
you want to make a partial of (often, this is just a single one)sisters
), that can be generated as a functionget_sister_indices(leaves: Sequence[int]) -> Sequence[int]
hashes
for each index inleaves
andsisters
How do we serialize the whole structure? Specifically, how do we serialize which indices the partial is referring to and in what order?
Proposed solution
First, we do not include
leaves
andsisters
in the partial. We definitely don't need to includesisters
because it can simply be calculated fromleaves
. But more importantly, we do not even need to includeleaves
, because much of the time some or all of the information fromleaves
will be known. For example, if we want to make a partial representing a Merkle path to a particular block in history from a history accumulator, it would be more reasonable to represent the block height (as that might already be included alongside the proof anyway), and then calculate the generalized index for the leaf from there. If we want to make a partial representing an entire object in a tree, we can easily calculate the set of generalized indices representing all leaves in that object. Also, this fits with the existing convention for Merkle proofs where the index is separate from the proof.Second, we store
hashes
in the following order: first, theleaves
in bit-alphabetical left-to-right order, then thesisters
in descending numerical order.Here's the definition of bit-alphabetical left-to-right order:
Example:
The idea is that it is a recursive "top, then left subtree, then right subtree" order (though
leaves
should never contain both "top" and "left" (or "top" and "right")). To see more clearly, here's the tree structure:The goal of bitwise sorting is so that if a partial includes an entire SSZ object that has multiple levels, the different levels will be provided in order. The goal of the "leaves then sisters in descending order" structure is so that a single-item SSZ partial is always identical to a Merkle proof: first the object, then the leaves in the Merkle tree from bottom-to-top order.
For example, if we want a partial of just element 5 in the above tree, the order would be [5, 4, 3], just as in a regular Merkle proof. If we want a partial of elements 5 and 6, it would be [5, 6, 7, 4]. For elements 4 and 5, it would be [4, 5, 3].