NethermindEth / nethermind

A robust execution client for Ethereum node operators.
https://nethermind.io/nethermind-client
GNU General Public License v3.0
1.26k stars 433 forks source link

Missing trie node #6253

Closed manuelsc closed 9 months ago

manuelsc commented 11 months ago

Description I ran a prune a couple days ago and now my nethermind node suddenly stopped working with this error:

Nov 07 11:10:07 staking nethermind[1012383]: 07 Nov 11:10:07 | Processing part of a long blocks branch 0/72. Block: 18518681 (0x981b8f...6e462b) 
Nov 07 11:10:07 staking nethermind[1012383]: 07 Nov 11:10:07 | Missing trie node 0xc8dea0fb8581a18ec9e664e0099b44d70ff471da3f2c7564e076c224139625b0, trying to recover from network 
Nov 07 11:10:07 staking nethermind[1012383]: 07 Nov 11:10:07 | Failed to recover missing trie node 
Nov 07 11:10:07 staking nethermind[1012383]: 07 Nov 11:10:07 | Missing trie node Account: ab14d68802a763f7db875346d03fbf86f137de55814b191c069e721f47474733, Storage: 00b0b7826c, trying to recover from network 
Nov 07 11:10:07 staking nethermind[1012383]: 07 Nov 11:10:07 | Failed to recover missing trie node 
Nov 07 11:10:07 staking nethermind[1012383]: 07 Nov 11:10:07 | Processing loop threw an exception. Block: 18518752 (0x1d97f0...397673), Exception: Nethermind.Trie.MissingTrieNodeException: Node 0xc8dea0fb8581a18ec9e664e0099b44d70ff471da3f2c7564e076c224139625b0 is missing from the DB 
Nov 07 11:10:07 staking nethermind[1012383]:  ---> Nethermind.Trie.TrieNodeException: Node 0xc8dea0fb8581a18ec9e664e0099b44d70ff471da3f2c7564e076c224139625b0 is missing from the DB 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.Pruning.TrieStore.LoadRlp(Keccak keccak, IKeyValueStore keyValueStore, ReadFlags readFlags) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Synchronization.Trie.HealingTrieStore.LoadRlp(Keccak keccak, ReadFlags readFlags) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.TrieNode.ResolveNode(ITrieNodeResolver tree, ReadFlags readFlags, ICappedArrayPool bufferPool) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.ResolveNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    --- End of inner exception stack trace --- 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.ResolveNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.ConnectNodes(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseLeaf(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseBranch(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseBranch(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseBranch(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseBranch(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseBranch(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseBranch(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseBranch(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.TraverseNode(TrieNode node, TraverseContext& traverseContext) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.Run(Span`1 updatePath, Int32 nibblesCount, CappedArray`1 updateValue, Boolean isUpdate, Boolean ignoreMissingDelete, Keccak startRootHash) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Trie.PatriciaTree.Set(ReadOnlySpan`1 rawKey, CappedArray`1 value) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Synchronization.Trie.HealingStorageTree.Set(ReadOnlySpan`1 rawKey, Byte[] value) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.State.StorageTree.Set(UInt256& index, Byte[] value) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.State.PersistentStorageProvider.CommitCore(IStorageTracer tracer) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.State.WorldState.Commit(IReleaseSpec releaseSpec, IWorldStateTracer tracer, Boolean isGenesis) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Evm.TransactionProcessing.TransactionProcessor.Execute(Transaction tx, BlockHeader header, ITxTracer tracer, ExecutionOptions opts) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.TransactionProcessorAdapterExtensions.ProcessTransaction(ITransactionProcessorAdapter transactionProcessor, Block block, Transaction currentTx, BlockReceiptsTracer receiptsTracer, ProcessingOptions processingOptions, IWorldState stateProvider) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.BlockProcessor.BlockValidationTransactionsExecutor.ProcessTransactions(Block block, ProcessingOptions processingOptions, BlockReceiptsTracer receiptsTracer, IReleaseSpec spec) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.BlockProcessor.ProcessBlock(Block block, IBlockTracer blockTracer, ProcessingOptions options) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.BlockProcessor.ProcessOne(Block suggestedBlock, ProcessingOptions options, IBlockTracer blockTracer) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.BlockProcessor.Process(Keccak newBranchStateRoot, List`1 suggestedBlocks, ProcessingOptions options, IBlockTracer blockTracer) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.BlockchainProcessor.ProcessBranch(ProcessingBranch& processingBranch, ProcessingOptions options, IBlockTracer tracer) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.BlockchainProcessor.Process(Block suggestedBlock, ProcessingOptions options, IBlockTracer tracer) 
Nov 07 11:10:07 staking nethermind[1012383]:    at Nethermind.Consensus.Processing.BlockchainProcessor.RunProcessingLoop() 

Steps to Reproduce Happened some days after pruning but not sure if this is really 100% related. I ran nethermind with these flags

ExecStart=/home/nethermind/nethermind \
  --baseDbPath /home/nethermind/.nethermind \
  --Metrics.Enabled true \
  --Metrics.PushGatewayUrl http://localhost:9091/metrics \
  --Metrics.IntervalSeconds 20 \
  --Sync.SnapSync true \
  --JsonRpc.JwtSecretFile /secrets/jwtsecret \
  --Network.MaxActivePeers 12 \
  --JsonRpc.Enabled true \
  --JsonRpc.EnabledModules=Admin,Eth,Subscribe,Trace,TxPool,Web3,Personal,Proof,Net,Parity,Health,Rpc \
  --Pruning.Mode=Hybrid \
  --Pruning.FullPruningTrigger=VolumeFreeSpace \
  --Pruning.FullPruningThresholdMb=400000 \
  --Pruning.FullPruningMemoryBudgetMb=4096 \
  --Pruning.CacheMb=4096 \
  --Pruning.FullPruningMaxDegreeOfParallelism=4 

Version is: Nethermind/v1.21.1+9b435ba0/linux-x64/dotnet7.0.13

I have a backup of the nethermind db folder if this helps to debug this issue. Let me know if I should provide it and how (500+ GB)

I resynced now to resolve this.

Actual behavior Nethermind became unusable and spamming the above mentioned error. It failed to recover on its own

Expected behavior This should not happen or the node should automatically recover from it.

Desktop (please complete the following information): Please provide the following information regarding your setup:

MarekM25 commented 10 months ago

Hi @manuelsc,

I apologize for the delay. We are actively working on resolving this edge-case issue. Our observations suggest that it is often related to:

MarekM25 commented 9 months ago

@manuelsc closing, but please reopen/create new issue in case your problem is not solved