Account selective snapshot

i-norden commented 2 years ago

Part of https://github.com/vulcanize/ipld-eth-state-snapshot/issues/55

Adds ability to configure with a list of addresses and the snapshot generator will limit its traversal to their corresponding paths.

Primary motivation was as a testing tool, and as a mechanism to relatively quickly determine the size of a storage contract (node count and size of output files or postgres db output).

We could also adapt this to be used as a basic way for "proof watchers" to lazily load the state they need to generate proofs from our Postgres blockstore or IPFS if we use ipfs-ethdb. This works with the concurrent read-only node iterator so could potentially be scaled out horizontally across different node sources (IPFS blockstore abstraction using ipfs-ethdb can do this). Watchers could also use a similar approach to perform their own statediffing on only their specific contracts, as an even more efficient way to export in realtime all the storage nodes they need to prove a given set of contracts.

Could also potentially be useful in cases where the state of a new contract needs to be added to an existing system asap.

@ashwinphatak the approach taken here should be repeatable for https://github.com/vulcanize/go-ethereum/issues/29#issuecomment-1149086538 (approach 2). It doesn't actually require any modification to the underlying iterator, I think it is simpler than the other approach and almost definitely is simpler in the case of concurrent traversal like in eth-statediff-service and is going to be much more performant.

AFDudley commented 2 years ago

@ashwinphatak @i-norden Can we try to get this tested and into all the parts of the stack that need it before we start processing ethereum blocks at Maxihost?

i-norden commented 2 years ago

This can be optimized by pruning the list of seekedPaths for each subtrie we traverse, as we progress down a subtrie we can eliminate from this set the paths which have already diverged at a higher level of the trie from the set of paths we need to continue checking as we traverse down lower levels of that specific subtrie. This will be beneficial when the set of paths we are seeking is large, but the additional overhead involved could outweigh the benefits for small sets (e.g. if we are only seeking one address, we only ever enter the subtrie along that account's path).

@ashwinphatak is this what you were referring to?

i-norden commented 2 years ago

Oh @ashwinphatak I see your point now, wow sorry, I will add a fix/adjustment. Since it.Next(bool) is called with true, we are not ignoring subtries.

prathamesh0 commented 2 years ago

@i-norden Created a PR with a few fixes:

It appears that the fix from https://github.com/vulcanize/ipld-eth-state-snapshot/pull/48 got inadvertently undone in the previous commit. I have added it back.
On running the current code, it seems that the iterator is getting stuck at the root node on passing descend as false to it.Next(). We can reach the first child node by calling it.Next(true) after processing the trie/subtrie root and do it.Next(false) thereafter to iterate over siblings instead of descending down further.

prathamesh0 commented 2 years ago

There is one limitation even after the above fixes: When num. of trie workers > 1, the trie traversal is divided between concurrent iterators only at the first level as we are creating new (non-parallel) iterators for further subtrie traversal. So even when number of concurrent workers is more than 16, the subtrie traversals are picked up by at most 16 of them while rest become idle on crossing their bounds immediately on first it.Next().

i-norden commented 2 years ago

Sorry for leaving this in a half complete state and thanks for the fixes @prathamesh0

cerc-io / ipld-eth-state-snapshot

Account selective snapshot #46