MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
1.97k stars 522 forks source link

Berkeley node crashing after regular [FATAL] *** Mina daemon has been offline for too long *** and *** Shutting Down *** notice #15717

Open naamahdaemon opened 4 weeks ago

naamahdaemon commented 4 weeks ago

Preliminary Checks

Description

30 minutes after the mina daemon started, I receive a (expected) shutting down notice followed by an exception and a daemon crash (not expected)

LOGS

Jun 04 23:22:30 systemd[1250]: Started Mina Daemon Service.
Jun 04 23:22:30 mina[258703]: 2024-06-04 21:22:30 UTC [Info] Mina daemon is booting up; built with commit "93e02797f72abe2cbf8dfca6f2328e9c8cd76546" on branch "berkeley"
Jun 04 23:22:30 mina[258703]: 2024-06-04 21:22:30 UTC [Info] Booting may take several seconds, please wait
Jun 04 23:22:30 mina[258703]: 2024-06-04 21:22:30 UTC [Info] Reading configuration files $config_files
Jun 04 23:22:30 mina[258703]:         config_files: [
Jun 04 23:22:30 mina[258703]:   "/var/lib/coda/config_93e02797.json",
Jun 04 23:22:30 mina[258703]:   "/home/mina/.mina-config/daemon.json"
Jun 04 23:22:30 mina[258703]: ]
Jun 04 23:22:30 mina[258703]: 2024-06-04 21:22:30 UTC [Warn] Could not read configuration from "/home/mina/.mina-config/daemon.json"
Jun 04 23:22:30 mina[258703]: 2024-06-04 21:22:30 UTC [Info] Initializing with runtime configuration. Ledger name: null
Jun 04 23:22:30 mina[258703]: 2024-06-04 21:22:30 UTC [Info] Using the constraint constants from the configuration file
Jun 04 23:22:31 mina[258703]: 2024-06-04 21:22:31 UTC [Info] Loaded genesis ledger from $ledger_file
Jun 04 23:22:31 mina[258703]:         ledger_file: "/var/lib/coda/genesis_ledger_5605a5a45c8ef6aac8fa88d5228640c00c16247a93cbeb55e2c95fd4782e88d2.tar.gz"
Jun 04 23:22:42 mina[258703]: Using block producer keypair private-key password from environment variable MINA_PRIVKEY_PASS
Jun 04 23:23:09 mina[258703]: 2024-06-04 21:23:09 UTC [Info] Daemon will use chain id a7351abc7ddf2ea92d1b38cc8e636c271c1dfd2c081c637f62ebc2af34eb7cc1
Jun 04 23:23:09 mina[258703]: 2024-06-04 21:23:09 UTC [Info] Daemon running protocol version 3.0.0
Jun 04 23:23:09 mina[258703]: Using uptime submitter keypair private-key password from environment variable UPTIME_PRIVKEY_PASS
Jun 04 23:23:10 mina[258703]: 2024-06-04 21:23:10 UTC [Info] Starting a new prover process
Jun 04 23:23:37 mina[258703]: 2024-06-04 21:23:37 UTC [Info] Daemon started process of kind "Prover" with pid 258812
Jun 04 23:23:37 mina[258703]: 2024-06-04 21:23:37 UTC [Info] Starting a new verifier process
Jun 04 23:24:01 mina[258703]: 2024-06-04 21:24:01 UTC [Info] Daemon started process of kind "Verifier" with pid 258834
Jun 04 23:24:01 mina[258703]: 2024-06-04 21:24:01 UTC [Info] Starting a new vrf-evaluator process
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Daemon started process of kind "Vrf_evaluator" with pid 258855
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Starting a new uptime service SNARK worker process
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Daemon started process of kind "Uptime_snark_worker" with pid 258860
Jun 04 23:24:02 mina[258703]: Environment variable MINA_TIME_OFFSET not found, using default of 0
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Custom child process "libp2p_helper" started with pid 258866
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Error] libp2p_helper: failed to set log level debug for routing/record: error: No such logger
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] libp2p peer ID this session is $peer_id
Jun 04 23:24:02 mina[258703]:         peer_id: "***************************************"
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Network not instantiated when node status requested
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Starting transition router
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Warn] Node started before the chain start time: waiting 9357062 milliseconds before starting participation
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Warn] GCLOUD_KEYFILE environment variable not set. Must be set to use upload_blocks_to_gcloud
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Mina daemon is connecting
Jun 04 23:24:02 mina[258703]: 2024-06-04 21:24:02 UTC [Info] Created GraphQL server with limited queries at: http://localhost:3095/graphql
Jun 04 23:24:03 mina[258703]: 2024-06-04 21:24:03 UTC [Info] Initializing plugins
Jun 04 23:24:03 mina[258703]: 2024-06-04 21:24:03 UTC [Warn] Node started before genesis: waiting 9356976 milliseconds before starting block producer
Jun 04 23:24:03 mina[258703]: 2024-06-04 21:24:03 UTC [Info] Starting uptime service using URL $url
Jun 04 23:24:03 mina[258703]:         url: "https://uptime-backend.minaprotocol.com/v1/submit"
Jun 04 23:24:03 mina[258703]: 2024-06-04 21:24:03 UTC [Info] Stopping daemon after 10260 mins and when there are no blocks to be produced
Jun 04 23:24:03 mina[258703]: 2024-06-04 21:24:03 UTC [Info] Attempted to turn on snark worker, but snark worker key is set to none
Jun 04 23:24:03 mina[258703]: 2024-06-04 21:24:03 UTC [Info] Daemon ready. Clients can now connect
Jun 04 23:24:03 mina[258703]: 2024-06-04 21:24:03 UTC [Info] Created GraphQL server at: http://localhost:3085/graphql
Jun 04 23:24:05 mina[258703]: 2024-06-04 21:24:05 UTC [Info] Mina daemon is listening
Jun 04 23:27:00 mina[258703]: 2024-06-04 21:27:00 UTC [Info] Uptime service will attempt to send a block and SNARK work
Jun 04 23:27:00 mina[258703]: 2024-06-04 21:27:00 UTC [Info] Transition frontier not available to send a block to uptime service
Jun 04 23:29:02 mina[258703]: 2024-06-04 21:29:02 UTC [Info] Offline for too long; restarting libp2p_helper
Jun 04 23:29:02 mina[258703]: 2024-06-04 21:29:02 UTC [Info] libp2p_helper process killed successfully: "died after receiving sigterm (signal number 15)"
Jun 04 23:29:03 mina[258703]: 2024-06-04 21:29:03 UTC [Info] Custom child process "libp2p_helper" started with pid 258957
Jun 04 23:29:03 mina[258703]: 2024-06-04 21:29:03 UTC [Info] libp2p peer ID this session is $peer_id
Jun 04 23:29:03 mina[258703]:         peer_id: "***************************************"
Jun 04 23:29:03 mina[258703]: 2024-06-04 21:29:03 UTC [Error] libp2p_helper: failed to set log level debug for routing/record: error: No such logger
Jun 04 23:42:00 mina[258703]: 2024-06-04 21:42:00 UTC [Info] Uptime service will attempt to send a block and SNARK work
Jun 04 23:42:00 mina[258703]: 2024-06-04 21:42:00 UTC [Info] Transition frontier not available to send a block to uptime service
Jun 04 23:54:02 mina[258703]: [FATAL] *** Mina daemon has been offline for too long ***
Jun 04 23:54:02 mina[258703]: *** Shutting down ***
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Fatal] Unhandled top-level exception: $exn
Jun 04 23:54:02 mina[258703]: Generating crash report
Jun 04 23:54:02 mina[258703]:         exn: {
Jun 04 23:54:02 mina[258703]:   "commit_id": "93e02797f72abe2cbf8dfca6f2328e9c8cd76546",
Jun 04 23:54:02 mina[258703]:   "sexp": [ "Mina_lib.Offline_shutdown" ],
Jun 04 23:54:02 mina[258703]:   "backtrace": [
Jun 04 23:54:02 mina[258703]:     "Raised at Mina_lib.create_sync_status_observer.(fun) in file \"src/lib/mina_lib/mina_lib.ml\", line 500, characters 53-75",
Jun 04 23:54:02 mina[258703]:     "Called from Async_kernel__Time_source.Event.run_at.fire in file \"src/time_source.ml\", line 361, characters 23-26",
Jun 04 23:54:02 mina[258703]:     "Called from Async_kernel__Job_queue.run_job in file \"src/job_queue.ml\" (inlined), line 128, characters 2-5",
Jun 04 23:54:02 mina[258703]:     "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 169, characters 6-47"
Jun 04 23:54:02 mina[258703]:   ]
Jun 04 23:54:02 mina[258703]: }
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Fatal] Prover terminated due to signal, terminating daemon
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Info] Running async shutdown handler: "Close transition frontier, if exists"
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Info] Running async shutdown handler: "Remove daemon lockfile"
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Error] verifier terminated unexpectedly
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Info] Starting a new verifier process
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Info] verifier successfully stopped
Jun 04 23:54:02 systemd[1250]: mina.service: Main process exited, code=exited, status=1/FAILURE
Jun 04 23:54:02 systemd[1250]: mina.service: Failed with result 'exit-code'.
Jun 04 23:54:33 systemd[1250]: mina.service: Scheduled restart job, restart counter is at 2.
Jun 04 23:54:33 systemd[1250]: Stopped Mina Daemon Service.
Jun 04 23:54:33 systemd[1250]: Started Mina Daemon Service.

Steps to Reproduce

  1. Start mina
  2. Wait 30 minutes
  3. Check logs for crash report

Expected Result

Daemon restart but does not crash.

Actual Result

Daemon crashes with the following exception :

Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Fatal] Unhandled top-level exception: $exn
Jun 04 23:54:02 mina[258703]: Generating crash report
Jun 04 23:54:02 mina[258703]:         exn: {
Jun 04 23:54:02 mina[258703]:   "commit_id": "93e02797f72abe2cbf8dfca6f2328e9c8cd76546",
Jun 04 23:54:02 mina[258703]:   "sexp": [ "Mina_lib.Offline_shutdown" ],
Jun 04 23:54:02 mina[258703]:   "backtrace": [
Jun 04 23:54:02 mina[258703]:     "Raised at Mina_lib.create_sync_status_observer.(fun) in file \"src/lib/mina_lib/mina_lib.ml\", line 500, characters 53-75",
Jun 04 23:54:02 mina[258703]:     "Called from Async_kernel__Time_source.Event.run_at.fire in file \"src/time_source.ml\", line 361, characters 23-26",
Jun 04 23:54:02 mina[258703]:     "Called from Async_kernel__Job_queue.run_job in file \"src/job_queue.ml\" (inlined), line 128, characters 2-5",
Jun 04 23:54:02 mina[258703]:     "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 169, characters 6-47"
Jun 04 23:54:02 mina[258703]:   ]
Jun 04 23:54:02 mina[258703]: }
Jun 04 23:54:02 mina[258703]: 2024-06-04 21:54:02 UTC [Fatal] Prover terminated due to signal, terminating daemon

Daemon version

commit "93e02797f72abe2cbf8dfca6f2328e9c8cd76546" on branch "berkeley"

How frequently do you see this issue?

Frequently

What is the impact of this issue on your ability to run a node?

Blocker

Status

Mina daemon status
-----------------------------------

Max observed block height:              359605
Max observed unvalidated block height:  0
Local uptime:                           1m48s
Chain id:                               a7351abc7ddf2ea92d1b38cc8e636c271c1dfd2c081c637f62ebc2af34eb7cc1
Git SHA-1:                              93e02797f72abe2cbf8dfca6f2328e9c8cd76546
Configuration directory:                /home/mina/.mina-config
Peers:                                  139
User_commands sent:                     0
SNARK worker:                           None
SNARK work fee:                         100000000
Sync status:                            Bootstrap
Block producers running:                1 (B62...)
Coinbase receiver:                      B62...
Consensus time now:                     epoch=0, slot=0
Consensus mechanism:                    proof_of_stake
Consensus configuration:
        Delta:                     0
        k:                         290
        Slots per epoch:           7140
        Slot duration:             3m
        Epoch duration:            14d21h
        Chain start timestamp:     2024-06-05 00:00:00.000000Z
        Acceptable network delay:  3m

Addresses and ports:
        External IP:    ***.***.***.***
        Bind IP:        0.0.0.0
        Libp2p PeerID:  *************************************************
        Libp2p port:    8302
        Client port:    8301

Metrics:
        block_production_delay:             7 (0 0 0 0 0 0 0)
        transaction_pool_diff_received:     0
        transaction_pool_diff_broadcasted:  0
        transactions_added_to_pool:         0
        transaction_pool_size:              0
        snark_pool_diff_received:           0
        snark_pool_diff_broadcasted:        0
        pending_snark_work:                 0
        snark_pool_size:                    0

Additional information

Also encountered what looks like another error (don't know if related) :

juin 04 23:27:40 mina[3367212]: [FATAL] *** Mina daemon has been offline for too long ***
juin 04 23:27:40 mina[3367212]: *** Shutting down ***
juin 04 23:27:40 mina[3367212]: 2024-06-04 21:27:40 UTC [Fatal] Unhandled top-level exception: $exn
juin 04 23:27:40 mina[3367212]: Generating crash report
juin 04 23:27:40 mina[3367212]:         exn: {
juin 04 23:27:40 mina[3367212]:   "commit_id": "93e02797f72abe2cbf8dfca6f2328e9c8cd76546",
juin 04 23:27:40 mina[3367212]:   "sexp": [ "Mina_lib.Offline_shutdown" ],
juin 04 23:27:40 mina[3367212]:   "backtrace": [
juin 04 23:27:40 mina[3367212]:     "Raised at Mina_lib.create_sync_status_observer.(fun) in file \"src/lib/mina_lib/mina_lib.ml\", line 500, characters 53-75",
juin 04 23:27:40 mina[3367212]:     "Called from Async_kernel__Time_source.Event.run_at.fire in file \"src/time_source.ml\", line 361, characters 23-26",
juin 04 23:27:40 mina[3367212]:     "Called from Async_kernel__Job_queue.run_job in file \"src/job_queue.ml\" (inlined), line 128, characters 2-5",
juin 04 23:27:40 mina[3367212]:     "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 169, characters 6-47"
juin 04 23:27:40 mina[3367212]:   ]
juin 04 23:27:40 mina[3367212]: }
juin 04 23:27:40 mina[3367212]: 2024-06-04 21:27:40 UTC [Fatal] libp2p_helper process died unexpectedly: "died after receiving sigkill (signal number 9)"
juin 04 23:27:40 mina[3367212]: 2024-06-04 21:27:40 UTC [Error] Encountered $error while asking libp2p_helper for peers
juin 04 23:27:40 mina[3367212]:         error: {
juin 04 23:27:40 mina[3367212]:   "commit_id": "93e02797f72abe2cbf8dfca6f2328e9c8cd76546",
juin 04 23:27:40 mina[3367212]:   "string": "libp2p_helper process died before answering"
juin 04 23:27:40 mina[3367212]: }
juin 04 23:27:40 mina[3367212]: 2024-06-04 21:27:40 UTC [Error] Could not send error report: Node_error_service was not configured
juin 04 23:27:40 mina[3367212]: 2024-06-04 21:27:40 UTC [Fatal] Unhandled top-level exception: $exn
juin 04 23:27:40 mina[3367212]: Generating crash report
juin 04 23:27:40 mina[3367212]:         exn: {
juin 04 23:27:40 mina[3367212]:   "commit_id": "93e02797f72abe2cbf8dfca6f2328e9c8cd76546",
juin 04 23:27:40 mina[3367212]:   "sexp": [
juin 04 23:27:40 mina[3367212]:     "monitor.ml.Error",
juin 04 23:27:40 mina[3367212]:     [
juin 04 23:27:40 mina[3367212]:       "exn.ml.Reraised",
juin 04 23:27:40 mina[3367212]:       "Mina_net2 raised an exception",
juin 04 23:27:40 mina[3367212]:       [
juin 04 23:27:40 mina[3367212]:         "monitor.ml.Error",
juin 04 23:27:40 mina[3367212]:         [ "Mina_net2__Libp2p_helper.Libp2p_helper_died_unexpectedly" ],
juin 04 23:27:40 mina[3367212]:         [
juin 04 23:27:40 mina[3367212]:           "Raised at Mina_net2__Libp2p_helper.handle_libp2p_helper_termination.(fun) in file \"src/lib/mina_net2/libp2p_helper.ml\", line 146, characters 8-45",
juin 04 23:27:40 mina[3367212]:           "Called from Async_kernel__Deferred1.M.map.(fun) in file \"src/deferred1.ml\", line 17, characters 40-45",
juin 04 23:27:40 mina[3367212]:           "Called from Async_kernel__Job_queue.run_job in file \"src/job_queue.ml\" (inlined), line 128, characters 2-5",
juin 04 23:27:40 mina[3367212]:           "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 169, characters 6-47",
juin 04 23:27:40 mina[3367212]:           "Caught by monitor at file \"src/lib/gossip_net/libp2p.ml\", line 236, characters 31-31"
juin 04 23:27:40 mina[3367212]:         ]
juin 04 23:27:40 mina[3367212]:       ]
juin 04 23:27:40 mina[3367212]:     ],
juin 04 23:27:40 mina[3367212]:     [
juin 04 23:27:40 mina[3367212]:       "Raised at Base__Exn.reraise in file \"src/exn.ml\", line 59, characters 22-49",
juin 04 23:27:40 mina[3367212]:       "Called from Base__Result.try_with in file \"src/result.ml\", line 195, characters 9-15",
juin 04 23:27:40 mina[3367212]:       "Caught by monitor coda"
juin 04 23:27:40 mina[3367212]:     ]
juin 04 23:27:40 mina[3367212]:   ],
juin 04 23:27:40 mina[3367212]:   "backtrace": [
juin 04 23:27:40 mina[3367212]:     "Raised at Consensus__Epoch.of_time_exn in file \"src/lib/consensus/epoch.ml\", line 12, characters 4-106",
juin 04 23:27:40 mina[3367212]:     "Called from Consensus__Epoch.epoch_and_slot_of_time_exn in file \"src/lib/consensus/epoch.ml\", line 44, characters 14-39",
juin 04 23:27:40 mina[3367212]:     "Called from Consensus__Global_slot.Make_str.of_time_exn in file \"src/lib/consensus/global_slot.ml\", line 117, characters 6-56",
juin 04 23:27:40 mina[3367212]:     "Called from Mina_commands.get_status.(fun) in file \"src/lib/mina_commands/mina_commands.ml\", line 297, characters 6-119"
juin 04 23:27:40 mina[3367212]:   ]
juin 04 23:27:40 mina[3367212]: }
Syd-ai commented 4 weeks ago

Exactly the same as Naamah