Open feld opened 8 months ago
Worth noting for the record that halting is an opt-in thing. It's actually very useful for a full node to prevent it following a non-upgraded chain if it misses an update and then having to completely resync to get back to the correct chain because it no longer has world states old enough to handle the very large reorg.
Also, the recommended protocol version is updated prior to the required protocol version being updated which causes op-geth to log an error about being out of date but not halt.
--rollup.halt value ($GETH_ROLLUP_HALT)
Opt-in option to halt on incompatible protocol version requirements of the given
level (major/minor/patch/none), as signaled through the Engine API by the rollup
node
I believe you can achieve a policy of only ever logging by using the options from op-node
:
--rollup.halt value ($OP_NODE_ROLLUP_HALT)
Opt-in option to halt on incompatible protocol version requirements of the given
level (major/minor/patch/none), as signaled onchain in L1
--rollup.load-protocol-versions (default: false) ($OP_NODE_ROLLUP_LOAD_PROTOCOL_VERSIONS)
Load protocol versions from the superchain L1 ProtocolVersions contract (if
available), and report in logs and metrics
You can instruct it to load protocol versions but not opt in to rollup halt and op-node will use the JSON-RPC call to also check that its op-geth is up to date.
How is this opt-in? I did not set the GET_ROLLUP_HALT
env or --rollup.halt on either Optimism or Base nodes that I operate, but I did encounter this behavior.
If you look at https://github.com/ethereum-optimism/op-geth/blob/425e757c51a1148cf6e3451157f1b666b2242b81/eth/backend.go#L616-L625 if s.config.RollupHaltOnIncompatibleProtocolVersion
is not set, it does not halt. That config var is set in https://github.com/ethereum-optimism/op-geth/blob/352fbe634837af931b6d5128901b31cd2550bffe/cmd/utils/flags.go#L1848 to the value set in the rollup.halt
flag which has now default value: https://github.com/ethereum-optimism/op-geth/blob/352fbe634837af931b6d5128901b31cd2550bffe/cmd/utils/flags.go#L888-L892
So it's off by default. If op-geth halted then something must have set that flag.
What version of op-geth were you running?
Hey could we maybe not automatically halt when a major version change is encountered and just log an error instead?
This is the opposite of designing your distributed system to be antifragile.
https://github.com/ethereum-optimism/op-geth/blob/336d284b606ec4792a605932201b97f04981db9d/eth/backend.go#L612-L640