cerc-io / ipld-eth-state-snapshot

Read-only mirror of https://git.vdb.to/cerc-io/ipld-eth-state-snapshot (Tool for inserting the entire state and storage tries into PG-IPFS)
https://git.vdb.to/cerc-io/ipld-eth-state-snapshot
GNU Affero General Public License v3.0
8 stars 4 forks source link

Account selective snapshot #46

Closed i-norden closed 2 years ago

i-norden commented 2 years ago

Part of https://github.com/vulcanize/ipld-eth-state-snapshot/issues/55

Adds ability to configure with a list of addresses and the snapshot generator will limit its traversal to their corresponding paths.

Primary motivation was as a testing tool, and as a mechanism to relatively quickly determine the size of a storage contract (node count and size of output files or postgres db output).

We could also adapt this to be used as a basic way for "proof watchers" to lazily load the state they need to generate proofs from our Postgres blockstore or IPFS if we use ipfs-ethdb. This works with the concurrent read-only node iterator so could potentially be scaled out horizontally across different node sources (IPFS blockstore abstraction using ipfs-ethdb can do this). Watchers could also use a similar approach to perform their own statediffing on only their specific contracts, as an even more efficient way to export in realtime all the storage nodes they need to prove a given set of contracts.

Could also potentially be useful in cases where the state of a new contract needs to be added to an existing system asap.

@ashwinphatak the approach taken here should be repeatable for https://github.com/vulcanize/go-ethereum/issues/29#issuecomment-1149086538 (approach 2). It doesn't actually require any modification to the underlying iterator, I think it is simpler than the other approach and almost definitely is simpler in the case of concurrent traversal like in eth-statediff-service and is going to be much more performant.

AFDudley commented 2 years ago

@ashwinphatak @i-norden Can we try to get this tested and into all the parts of the stack that need it before we start processing ethereum blocks at Maxihost?

i-norden commented 2 years ago

This can be optimized by pruning the list of seekedPaths for each subtrie we traverse, as we progress down a subtrie we can eliminate from this set the paths which have already diverged at a higher level of the trie from the set of paths we need to continue checking as we traverse down lower levels of that specific subtrie. This will be beneficial when the set of paths we are seeking is large, but the additional overhead involved could outweigh the benefits for small sets (e.g. if we are only seeking one address, we only ever enter the subtrie along that account's path).

@ashwinphatak is this what you were referring to?

i-norden commented 2 years ago

Oh @ashwinphatak I see your point now, wow sorry, I will add a fix/adjustment. Since it.Next(bool) is called with true, we are not ignoring subtries.

prathamesh0 commented 2 years ago

@i-norden Created a PR with a few fixes:

prathamesh0 commented 2 years ago

There is one limitation even after the above fixes: When num. of trie workers > 1, the trie traversal is divided between concurrent iterators only at the first level as we are creating new (non-parallel) iterators for further subtrie traversal. So even when number of concurrent workers is more than 16, the subtrie traversals are picked up by at most 16 of them while rest become idle on crossing their bounds immediately on first it.Next().

i-norden commented 2 years ago

Sorry for leaving this in a half complete state and thanks for the fixes @prathamesh0