MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
1.99k stars 529 forks source link

Node crashed with no persistent root identifier found (should have been written already) #14367

Open deepthiskumar opened 1 year ago

deepthiskumar commented 1 year ago

Preliminary Checks

Description

Node crashing with the following exception [As seen from error reporting]

monitor.ml.Error, Failure, no persistent root identifier found (should have been written already), Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33, Called from Transition_frontier.load_from_persistence_and_start.(fun) in file "src/lib/transition_frontier/transition_frontier.ml", line 108, characters 8-112, Called from Transition_frontier.load_with_max_length.(fun).continue in file "src/lib/transition_frontier/transition_frontier.ml", line 225, characters 10-297, Called from Transition_frontier.load_with_max_length.(fun) in file "src/lib/transition_frontier/transition_frontier.ml", line 341, characters 8-111, Called from Base__Result.try_with in file "src/result.ml", line 195, characters 9-15, Caught by monitor coda

Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33, Called from O1trace.exec_thread in file "src/lib/o1trace/o1trace.ml", line 77, characters 6-27, Called from Transition_router.load_frontier.(fun) in file "src/lib/transition_router/transition_router.ml", line 274, characters 4-160, Called from Transition_router.initialize.(fun) in file "src/lib/transition_router/transition_router.ml", line 355, characters 6-160, Called from Async_kernel__Deferred0.bind.(fun) in file "src/deferred0.ml", line 54, characters 64-69, Called from Async_kernelJob_queue.run_job in file "src/job_queue.ml" (inlined), line 128, characters 2-5, Called from Async_kernelJob_queue.run_jobs in file "src/job_queue.ml", line 169, characters 6-47

Steps to Reproduce

Unclear at the moment. Node restarted and crashed.

Expected Result

Node should be able to load persisted frontier and root

Actual Result

Node crashed. Seeing quite a few occurrences of this error. Medium impact as the node can be restarted but needs to bootstrap

How frequently do you see this issue?

Frequently

What is the impact of this issue on your ability to run a node?

Medium

Status

From the error report- 
data.sync_status
Listening

data.timestamp
Oct 17, 2023 @ 15:57:54.000

data.uptime_of_node
2.793m

Additional information

No response

nepser84 commented 1 year ago

Similar problem. Around 20 hours the second BP-node keeps crashing. Server settings: AMD Ryzen 7 3700X, HDD2x SSD M.2 NVMe 1 TB, RAM 64 DDR4. My report: coda_crash_report_2023-10-27_14-00-24.514966.tar.gz