FreeUKGen / Systemwide

repository for issues that affect all systems within the Free UK Gen portfolio
0 stars 0 forks source link

[production][search_queries#create] No primary server is available in cluster: #<Cluster topology=ReplicaSetNoPrimary[mongodb2.intern... #246

Closed Captainkirkdawson closed 2 years ago

Captainkirkdawson commented 2 years ago

mongodb1 went AWOL at 1pm on 14 Feb

Captainkirkdawson commented 2 years ago

awol.png

lemon-ukgen commented 2 years ago
Feb 14 14:14:48 mongodb1 kernel: pid 99497 (mongo), jid 0, uid 3023: exited on signal 6 (core dumped)
{"t":{"$date":"2022-02-14T14:57:37.431+00:00"},"s":"I",  "c":"CONTROL",  "id":23381,   "ctx":"SignalHandler","msg":"will terminate after current cmd ends"}
{"t":{"$date":"2022-02-14T14:57:37.431+00:00"},"s":"I",  "c":"REPL",     "id":4784900, "ctx":"SignalHandler","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":10000}}
{"t":{"$date":"2022-02-14T14:57:37.431+00:00"},"s":"I",  "c":"REPL",     "id":21343,   "ctx":"RstlKillOpThread","msg":"Starting to kill user operations"}
{"t":{"$date":"2022-02-14T14:57:37.431+00:00"},"s":"I",  "c":"REPL",     "id":21344,   "ctx":"RstlKillOpThread","msg":"Stopped killing user operations"}

...

{"t":{"$date":"2022-02-14T14:57:37.432+00:00"},"s":"I",  "c":"REPL",     "id":21347,   "ctx":"SignalHandler","msg":"Handing off election","attr":{"target":"mongodb2.internal.freeukgen.org.uk:27017"}}
{"t":{"$date":"2022-02-14T14:57:37.432+00:00"},"s":"I",  "c":"COMMAND",  "id":4784901, "ctx":"SignalHandler","msg":"Shutting down the MirrorMaestro"}
{"t":{"$date":"2022-02-14T14:57:37.432+00:00"},"s":"I",  "c":"REPL",     "id":40441,   "ctx":"SignalHandler","msg":"Stopping TopologyVersionObserver"}
{"t":{"$date":"2022-02-14T14:57:37.432+00:00"},"s":"I",  "c":"REPL",     "id":40447,   "ctx":"TopologyVersionObserver","msg":"Stopped TopologyVersionObserver"}
{"t":{"$date":"2022-02-14T14:57:37.432+00:00"},"s":"I",  "c":"SHARDING", "id":4784902, "ctx":"SignalHandler","msg":"Shutting down the WaitForMajorityService"}
{"t":{"$date":"2022-02-14T14:57:37.433+00:00"},"s":"I",  "c":"CONTROL",  "id":4784903, "ctx":"SignalHandler","msg":"Shutting down the LogicalSessionCache"}
{"t":{"$date":"2022-02-14T14:57:37.433+00:00"},"s":"I",  "c":"NETWORK",  "id":20562,   "ctx":"SignalHandler","msg":"Shutdown: going to close listening sockets"}
{"t":{"$date":"2022-02-14T14:57:37.433+00:00"},"s":"I",  "c":"NETWORK",  "id":23017,   "ctx":"listener","msg":"removing socket file","attr":{"path":"/tmp/mongodb-27017.sock"}}
{"t":{"$date":"2022-02-14T14:57:37.433+00:00"},"s":"I",  "c":"NETWORK",  "id":4784905, "ctx":"SignalHandler","msg":"Shutting down the global connection pool"}
{"t":{"$date":"2022-02-14T14:57:37.433+00:00"},"s":"I",  "c":"STORAGE",  "id":4784906, "ctx":"SignalHandler","msg":"Shutting down the FlowControlTicketholder"}
{"t":{"$date":"2022-02-14T14:57:37.433+00:00"},"s":"I",  "c":"-",        "id":20520,   "ctx":"SignalHandler","msg":"Stopping further Flow Control ticket acquisitions."}
{"t":{"$date":"2022-02-14T14:57:37.433+00:00"},"s":"I",  "c":"REPL",     "id":4784907, "ctx":"SignalHandler","msg":"Shutting down the replica set node executor"}
{"t":{"$date":"2022-02-14T14:57:37.434+00:00"},"s":"I",  "c":"ASIO",     "id":22582,   "ctx":"ReplNodeDbWorkerNetwork","msg":"Killing all outstanding egress activity."}
{"t":{"$date":"2022-02-14T14:57:37.435+00:00"},"s":"I",  "c":"STORAGE",  "id":4784908, "ctx":"SignalHandler","msg":"Shutting down the PeriodicThreadToAbortExpiredTransactions"}
{"t":{"$date":"2022-02-14T14:57:37.435+00:00"},"s":"I",  "c":"STORAGE",  "id":4784934, "ctx":"SignalHandler","msg":"Shutting down the PeriodicThreadToDecreaseSnapshotHistoryCachePressure"}
{"t":{"$date":"2022-02-14T14:57:37.436+00:00"},"s":"I",  "c":"REPL",     "id":4784909, "ctx":"SignalHandler","msg":"Shutting down the ReplicationCoordinator"}
{"t":{"$date":"2022-02-14T14:57:37.436+00:00"},"s":"I",  "c":"REPL",     "id":21328,   "ctx":"SignalHandler","msg":"Shutting down replication subsystems"}
{"t":{"$date":"2022-02-14T14:57:37.436+00:00"},"s":"I",  "c":"REPL",     "id":21302,   "ctx":"SignalHandler","msg":"Stopping replication reporter thread"}
{"t":{"$date":"2022-02-14T14:57:37.437+00:00"},"s":"I",  "c":"REPL",     "id":21303,   "ctx":"SignalHandler","msg":"Stopping replication fetcher thread"}
{"t":{"$date":"2022-02-14T14:57:37.437+00:00"},"s":"I",  "c":"REPL",     "id":21304,   "ctx":"SignalHandler","msg":"Stopping replication applier thread"}
{"t":{"$date":"2022-02-14T14:57:37.446+00:00"},"s":"I",  "c":"REPL",     "id":21107,   "ctx":"BackgroundSync","msg":"Stopping replication producer"}
lemon-ukgen commented 2 years ago

I don't think this was strictly "WOL", @Vino-S stopped and restarted at 14:57, which explains the second log above and the freed memory.

The crashed client at 14:14 isn't relevant to this restart.