dusk-network / dusk-blockchain

Reference implementation of the DUSK Network node, written in Golang
MIT License
102 stars 47 forks source link

Invalid state root after restart #1530

Open herr-seppia opened 1 year ago

herr-seppia commented 1 year ago

Describe the bug On devnet cluster, 3 nodes are stucked in "invalid state root" phase

To Reproduce Just launch a cluster and try to restart a single node

herr-seppia commented 1 year ago

Currently on devnet we have this error

{
  "level": "error",
  "msg": "invalid state detected",
  "node": "01333bdfd9d008e1eefc7080e596faf7de2555904fe26ba478ca5f861acf6e07",
  "process": "chain",
  "rusk": "1707b604702b2c47d543302f71320f2ffb384c641d8b382cf18ba3516ef6e49b",
  "time": "2023-06-08T12:38:42Z"
}

Dusk-blockchain was restart at block 1159 and the state_root for that block is indeed 1707b604702b2c47d543302f71320f2ffb384c641d8b382cf18ba3516ef6e49b

Curiously, the block 1100 has indeed the state_root 01333bdfd9d008e1eefc7080e596faf7de2555904fe26ba478ca5f861acf6e07 and the current configuration toml has

[state]
persistEvery = 100

So the problem is in how dusk-blockchain recover the tip of the network, because it seems that relies on the "Last Persisted Block" https://github.com/dusk-network/dusk-blockchain/blob/5b32220f401beeedb757c80de6b2dfa0ba322278/pkg/core/chain/chain.go#L68-L69

The culprit is the change we made on rusk, where the state now is really persisted at every "accept/finalize" and persist() method does nothing

Even if the culprit is on rusk, we should adapt dusk_blockchain in order to not rely on this mechanism

herr-seppia commented 1 year ago

Changing persistEvery to 1 as workaround