Open jannikluhn opened 5 years ago
Thanks for the write-up! I think it's the right way to rescue us from the immutable copy
for huge data.
/cc @pipermerriam, basically we will have trouble when serialize and tree hash the BeaconState
with 4M validators. The benefit of tree hashing has been discussed here and here, and we have to find a way to build the tree structure inside SSZ.
I'm aware of this problem space and I think @jannikluhn is correct that we likely need a formal API to be able to take advantage of this. I don't have opinions about specific implementation but will be happy to review one once it's ready or to weigh in on concepts for the API.
Problem
The tree hashing algorithm describes a hash function for SSZ data structures. For lists and containers, it is defined recursively: First, hash each element and, second, hash the results together. Thus, updating only a single element of a list or a field in a container does not require a full recalculation of the top level tree hash. We should take advantage of this to save unnecessary computation that can take quite long for big values (in particular, the beacon chain state). At the same time, we'd like our data structures to stay immutable.
Vitalik came up with a (mutable) example implementation here: https://github.com/ethereum/research/blob/2d3ed6e42087d5b14cdf107c897e8d3e5db3ee7a/ssz_hashable_list/hashable_list.py
Proposed Implementation
Container
ssz.Serializable
should memoize its tree hash.ssz.copy
should copy the tree hash as well.List
We need a new class (maybe named, as Vitalik proposed it,
HashableList
) that similar tossz.Serializable
memoizes its tree hash, but in addition to that the full hash tree. It should be an immutable sequence (i.e. define__getitem__
,__len__
,__reversed__
,__contains__
, and__iter__
), and have acopy
method with the following interface:The new list will copy the elements of the old one, but replace/insert/remove/prepend/append some elements depending on the given parameters. The old hash tree should be copied as well, updating the necessary branches only. For safety, we should probably either only allow a single argument or enforce a certain order of the arguments so that its clear to what the indices refer to.
Usage example