Make sense of `get`/`set`/`has`-`stateRoot`

acolytec3 commented 2 months ago

With the work being done in #3543, we have refactored the StateManagerInterface to make certain functions optional (getProof) in State Managers that do not support them so we have a single unified StateManager interface for all implementations that maintains the core features necessary for interacting with Ethereum state (read/write accounts/storage). The current interface also has getStateRoot/setStateRoot/hasStateRoot which really only apply to the DefaultStatemanager and some hopefully future StatefulVerkleStateManager implementations since you only need the state root when validating whole state transitions (i.e. blocks or maybe transactions). So, you only need access to a valid state root if you are interacting with the VM class doing the full work of maintaining an Ethereum chain with a stateful client sort of interface.

There was a time when we dreamed of the RPCStateManager being able to run blocks and validate state roots but there are inherent issues with the MPT that make this impossible if you don't already know the full state locally.

So, the question is, do we need to have these stateRoot getter/setter functions in the State Manager interface and explore how a statemanager which doesn't maintain the full Ethereum state on disk should expose a state Root so the vm package can use some of the alternative implementation? The RPCStateManager can already run blocks so it would seem not. Will need to research more to decide.

jochem-brouwer commented 2 months ago

My idea here would be to remove all those methods from StateManager and instead put them into Trie. (They are essentially already there). However, the problem is that these methods in StateManager have cache-cleanup operations so we should somehow manage that. So if you want to getStateRoot for instance, then you should ensure you should first flush() the state manager, otherwise you get an incorrect / outdated state root.

holgerd77 commented 2 months ago

Yeah, that's really a tricky one. I first thought, yeah, going to Trie directly might be a good option (since these things are trie specific - at least if named "as is"). But if one searches for setStateRoot() and where/how this is used one stumbles upon a lot of places where this is fully encapsulated by the stateManager and the trie is nowhere "there" or easy to access, and if we now start to do things like stateManager.trie?.[...] this rather binds things even more together.

Maybe somewhat from another perspective, what are these things when seen from the StateManager lense?

setStateRoot(): So setting the state root is actually just (re-)initializing the state manager respectively its state by going to a new state. I wonder if we just want to fully rename this thing to init() or initState() (I think this thing can also be directly called on the very first state manager initialization, and this replacement method might want to take an optional argument commitment or so like init(commitment?) and so the basic task description of this method is to initialize the state with whatever is needed (flushing (eventually empty) caches, setting to a new commitment (in the case of Trie: the state root)). A more simple/stateless state manager then just does not take/need the commitment, but otherwise also does everything which is needed for (state) initialization. This method then remains mandatory for the interface.
getStateRoot(): We might want to rename to getStateCommitment() or similar to be broader set up also for verkle but - likely ? - we want to make the method optional - since we just do not always have such a thing - and act on using/consuming code places in an appropriate way.
hasStateRoot(): We can very much forget about this method and - likely ? - remove, since this method is very rarely used and can be easily substituted.

ethereumjs / ethereumjs-monorepo

Make sense of `get`/`set`/`has`-`stateRoot` #3546