bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.85k stars 9.14k forks source link

Shards continuously crashloopbackoff - exit code 51 #6365

Closed francescocambi closed 3 years ago

francescocambi commented 3 years ago

Description

After a power loss at my house, my RKE cluster got down (it is my dev/test/study environment). When turning it back on, the mongodb-sharded containers started continuously getting CrashLoopBackoff. I cannot get enough troubleshoot information other than the exit code 51, but I cannot figure out which is the problem. Nobody online seems to know about this particular exit code.

The system was installed using the bitnami mongodb-sharded helm chart.

The configsvr and the mongos containers are working fine.

Steps to reproduce the issue:

  1. Deploy mongodb-sharded helm chart
  2. Put some data (< 1 GB) on your database
  3. Simulate a power failure for the whole k8s cluster (I know, probably not so common failure)
  4. Restore/restart k8s cluster

When coming back on, shards containers should continuously crash.

Describe the results you received:

Shards continuously crashloopbackoff with exit code 51

Describe the results you expected:

A more clear information for troubleshooting.

Additional information you deem important (e.g. issue happens only occasionally):

Here is the log before the pod crash: Shard 0


23:00:18.50 INFO ==> Setting node as primary
--
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.53
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.53 Welcome to the Bitnami mongodb-sharded container
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.53 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mongodb-sharded
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.53 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mongodb-sharded/issues
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.53
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.53 INFO ==> ** Starting MongoDB Sharded setup **
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.55 INFO ==> Validating settings in MONGODB_* env vars...
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.57 INFO ==> Initializing MongoDB Sharded...
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.58 INFO ==> Writing keyfile for replica set authentication...
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.59 INFO ==> Enabling authentication...
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.60 INFO ==> Deploying MongoDB Sharded with persisted data...
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.61 INFO ==> Trying to connect to MongoDB server mdb01-dev-mongodb-sharded...
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.62 INFO ==> Found MongoDB server listening at mdb01-dev-mongodb-sharded:27017 !
Fri, May 14 2021 1:00:18 am | mongodb 23:00:18.70 INFO ==> MongoDB server listening and working at mdb01-dev-mongodb-sharded:27017 !

Shard 1


23:01:28.57 INFO ==> Setting node as primary
--
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.59
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.59 Welcome to the Bitnami mongodb-sharded container
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.59 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mongodb-sharded
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.60 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mongodb-sharded/issues
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.60
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.60 INFO ==> ** Starting MongoDB Sharded setup **
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.62 INFO ==> Validating settings in MONGODB_* env vars...
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.63 INFO ==> Initializing MongoDB Sharded...
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.68 INFO ==> Writing keyfile for replica set authentication...
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.70 INFO ==> Enabling authentication...
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.71 INFO ==> Deploying MongoDB Sharded with persisted data...
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.73 INFO ==> Trying to connect to MongoDB server mdb01-dev-mongodb-sharded...
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.73 INFO ==> Found MongoDB server listening at mdb01-dev-mongodb-sharded:27017 !
Fri, May 14 2021 1:01:28 am | mongodb 23:01:28.84 INFO ==> MongoDB server listening and working at mdb01-dev-mongodb-sharded:27017 !

Version

mongodb-sharded version is: docker.io/bitnami/mongodb-sharded:4.4.4-debian-10-r34

[administrator@rke01 ~]$ docker version
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:27 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:47:25 2019
  OS/Arch:          linux/amd64
  Experimental:     false
[administrator@rke01 ~]$ docker info
Containers: 223
 Running: 183
 Paused: 0
 Stopped: 40
Images: 90
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-1160.11.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.67GiB
Name: rke01.intra.cloudfc.net
ID: 5X5D:FYZC:MTXX:EOLP:MGJ2:KW6O:BO7U:DTBU:AG5A:LILF:V7PQ:TA7L
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Additional environment details (AWS, VirtualBox, Docker for MAC, physical, etc.):

Rancher kuberenes engine - k8s 1.19.7 docker version: 18.9.2

My actual mongodb configuration is: -> 1 replica -> 2 shards -> persistence enabled with nfs-client storage class (dedicated external NFS server)

Let me know if you need anything else. I don't know if the issue can be a miss of mine, I thank you in advice for your work and your support.

miguelaeh commented 3 years ago

Hi @francescocambi , I am not sure if the MongoDB Helm Chart is prepared to recover from a full K8s cluster failure. In any case, looks weird to me that your logs finish in a point where nothing failed yet. Checking the container logic, after the

 MongoDB server listening and working at mdb01-dev-mongodb-sharded:27017 !

There should be another info or error message. Is it possible that the readiness probe is killing the pods? Could you share the output of the describe command?

francescocambi commented 3 years ago

Hi @miguelaeh, sure, it follows (I attached only the first shard, the other is the same. If you need it, tell me and I will provide it).

Name:         mdb01-dev-mongodb-sharded-shard0-data-0
Namespace:    development
Priority:     0
Node:         rke01.intra.cloudfc.net/192.168.3.40
Start Time:   Fri, 14 May 2021 00:59:14 +0200
Labels:       app.kubernetes.io/component=shardsvr
              app.kubernetes.io/instance=mdb01-dev
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=mongodb-sharded
              controller-revision-hash=mdb01-dev-mongodb-sharded-shard0-data-6cfbb758b6
              helm.sh/chart=mongodb-sharded-3.4.4
              shard=0
              statefulset.kubernetes.io/pod-name=mdb01-dev-mongodb-sharded-shard0-data-0
Annotations:  cni.projectcalico.org/podIP: 10.42.173.247/32
              cni.projectcalico.org/podIPs: 10.42.173.247/32
              kubectl.kubernetes.io/restartedAt: 2021-05-12T00:32:59+02:00
              prometheus.io/port: 9216
              prometheus.io/scrape: true
Status:       Running
IP:           10.42.173.247
IPs:
  IP:           10.42.173.247
Controlled By:  StatefulSet/mdb01-dev-mongodb-sharded-shard0-data
Containers:
  mongodb:
    Container ID:  docker://bff832274da0acbe4591d6e9f333aaab33c90b13d1bc9b86ac1cd75f4d3f4152
    Image:         docker.io/bitnami/mongodb-sharded:4.4.4-debian-10-r34
    Image ID:      docker-pullable://bitnami/mongodb-sharded@sha256:6e5ae0cd73df180408dd884f7b763188644efe152c2a1cb3371dea61c9457ccd
    Port:          27017/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
      /entrypoint/replicaset-entrypoint.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    51
      Started:      Sun, 16 May 2021 23:25:41 +0200
      Finished:     Sun, 16 May 2021 23:25:47 +0200
    Ready:          False
    Restart Count:  817
    Liveness:       exec [pgrep mongod] delay=60s timeout=5s period=10s #success=1 #failure=6
    Readiness:      exec [mongo --disableImplicitSessions --eval db.adminCommand('ping')] delay=60s timeout=5s period=10s #success=1 #failure=6
    Environment:
      MONGODB_SYSTEM_LOG_VERBOSITY:     0
      MONGODB_MAX_TIMEOUT:              120
      MONGODB_DISABLE_SYSTEM_LOG:       no
      MONGODB_SHARDING_MODE:            shardsvr
      MONGODB_POD_NAME:                 mdb01-dev-mongodb-sharded-shard0-data-0 (v1:metadata.name)
      MONGODB_MONGOS_HOST:              mdb01-dev-mongodb-sharded
      MONGODB_INITIAL_PRIMARY_HOST:     mdb01-dev-mongodb-sharded-shard0-data-0.mdb01-dev-mongodb-sharded-headless.development.svc.cluster.local
      MONGODB_REPLICA_SET_NAME:         mdb01-dev-mongodb-sharded-shard-0
      MONGODB_ADVERTISED_HOSTNAME:      $(MONGODB_POD_NAME).mdb01-dev-mongodb-sharded-headless.development.svc.cluster.local
      MONGODB_ROOT_PASSWORD:            <set to the key 'mongodb-root-password' in secret 'mdb01-dev-mongodb-sharded'>    Optional: false
      MONGODB_REPLICA_SET_KEY:          <set to the key 'mongodb-replica-set-key' in secret 'mdb01-dev-mongodb-sharded'>  Optional: false
      MONGODB_ENABLE_IPV6:              no
      MONGODB_ENABLE_DIRECTORY_PER_DB:  yes
    Mounts:
      /bitnami/mongodb from datadir (rw)
      /entrypoint from replicaset-entrypoint-configmap (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jd4c9 (ro)
  metrics:
    Container ID:  docker://6839df2667004c443c231385a13ec1041bb36bf233382b296d86b58bbd12ebfd
    Image:         docker.io/bitnami/mongodb-exporter:0.11.2-debian-10-r113
    Image ID:      docker-pullable://bitnami/mongodb-exporter@sha256:3ffebaf53830df50fc4d9c29512130e578aa93e5d475ef245aebc244362db64f
    Port:          9216/TCP
    Host Port:     0/TCP
    Command:
      sh
      -ec
      #!/bin/sh
      /bin/mongodb_exporter --web.listen-address ":9216" --mongodb.uri mongodb://root:`echo $MONGODB_ROOT_PASSWORD | sed -r "s/@/%40/g;s/:/%3A/g"`@localhost:27017/admin
    State:          Running
      Started:      Fri, 14 May 2021 00:59:16 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      MONGODB_ROOT_PASSWORD:  <set to the key 'mongodb-root-password' in secret 'mdb01-dev-mongodb-sharded'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jd4c9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-mdb01-dev-mongodb-sharded-shard0-data-0
    ReadOnly:   false
  replicaset-entrypoint-configmap:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      mdb01-dev-mongodb-sharded-replicaset-entrypoint
    Optional:  false
  default-token-jd4c9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-jd4c9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                      From     Message
  ----     ------   ----                     ----     -------
  Normal   Pulled   40m (x811 over 2d22h)    kubelet  Container image "docker.io/bitnami/mongodb-sharded:4.4.4-debian-10-r34" already present on machine
  Warning  BackOff  50s (x19634 over 2d22h)  kubelet  Back-off restarting failed container

I agree with you that it is weird/undesirable that the only thing we can know about the issue is that container is crashing with exit code 51 and no logs about it.

miguelaeh commented 3 years ago

It doesn't seem to be the readiness probe since when it fails it is usually spotted up in the pod describe. It is really difficult to figure out what is falling with that information. is there anything else relevant you can share?

francescocambi commented 3 years ago

Hi @miguelaeh , to debug the issue I changed the entrypoint command of the pod to a sleep, so that I can have the time to enter the pod with a shell and see what's going on. I added the MONGODB_EXTRA_FLAGS env with "--repair" to let mongo repair his data directory after the failure. The error exit code switched to 100. Looking at the entrypoint and run script, I tried to run them manually in the pod and finally I got the mongodb logs (but I still cannot find which is the issue). I'm posting the logs here in case you can get me an advice.

mdb01-dev-mongodb-sharded-shard0-data-0:/$ /run.sh
mongodb 18:47:08.56 INFO  ==> ** Starting MongoDB **

{"t":{"$date":"2021-05-17T18:47:08.578+00:00"},"s":"I",  "c":"CONTROL",  "id":20698,   "ctx":"main","msg":"***** SERVER RESTARTED *****"}
{"t":{"$date":"2021-05-17T18:47:08.579+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2021-05-17T18:47:08.581+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-05-17T18:47:08.581+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2021-05-17T18:47:08.612+00:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":816,"port":27017,"dbPath":"/bitnami/mongodb/data/db","architecture":"64-bit","host":"mdb01-dev-mongodb-sharded-shard0-data-0"}}
{"t":{"$date":"2021-05-17T18:47:08.612+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.4","gitVersion":"8db30a63db1a9d84bdcad0c83369623f708e0397","openSSLVersion":"OpenSSL 1.1.1d  10 Sep 2019","modules":[],"allocator":"tcmalloc","environment":{"distmod":"debian10","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2021-05-17T18:47:08.612+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"PRETTY_NAME=\"Debian GNU/Linux 10 (buster)\"","version":"Kernel 3.10.0-1160.11.1.el7.x86_64"}}}
{"t":{"$date":"2021-05-17T18:47:08.612+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/opt/bitnami/mongodb/conf/mongodb.conf","net":{"bindIp":"127.0.0.1","ipv6":false,"port":27017,"unixDomainSocket":{"enabled":true,"pathPrefix":"/opt/bitnami/mongodb/tmp"}},"processManagement":{"fork":false,"pidFilePath":"/opt/bitnami/mongodb/tmp/mongodb.pid"},"repair":true,"replication":{"enableMajorityReadConcern":true,"replSetName":"mdb01-dev-mongodb-sharded-shard-0"},"security":{"authorization":"enabled","keyFile":"/opt/bitnami/mongodb/conf/keyfile"},"setParameter":{"enableLocalhostAuthBypass":"false"},"sharding":{"clusterRole":"shardsvr"},"storage":{"dbPath":"/bitnami/mongodb/data/db","directoryPerDB":true,"journal":{"enabled":true}},"systemLog":{"destination":"file","logAppend":true,"logRotate":"reopen","path":"/opt/bitnami/mongodb/logs/mongodb.log","quiet":false,"verbosity":0}}}}
{"t":{"$date":"2021-05-17T18:47:08.617+00:00"},"s":"I",  "c":"STORAGE",  "id":22270,   "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/bitnami/mongodb/data/db","storageEngine":"wiredTiger"}}
{"t":{"$date":"2021-05-17T18:47:08.617+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=7510M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}}
{"t":{"$date":"2021-05-17T18:47:08.766+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277228:766908][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 38 through 39"}}
{"t":{"$date":"2021-05-17T18:47:09.691+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277229:691897][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 39 through 39"}}
{"t":{"$date":"2021-05-17T18:47:10.618+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277230:618902][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 38/7424 to 39/256"}}
{"t":{"$date":"2021-05-17T18:47:10.621+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277230:621369][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 38 through 39"}}
{"t":{"$date":"2021-05-17T18:47:10.704+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277230:704269][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 39 through 39"}}
{"t":{"$date":"2021-05-17T18:47:10.743+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277230:743418][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (1620472977, 1)"}}
{"t":{"$date":"2021-05-17T18:47:10.743+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277230:743474][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (1620472977, 1)"}}
{"t":{"$date":"2021-05-17T18:47:10.899+00:00"},"s":"I",  "c":"STORAGE",  "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":2282}}
{"t":{"$date":"2021-05-17T18:47:10.899+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":1620472977,"i":1}}}}
{"t":{"$date":"2021-05-17T18:47:10.899+00:00"},"s":"I",  "c":"STORAGE",  "id":22316,   "ctx":"initandlisten","msg":"Repairing size cache"}
{"t":{"$date":"2021-05-17T18:47:10.902+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:sizeStorer"}}
{"t":{"$date":"2021-05-17T18:47:10.904+00:00"},"s":"I",  "c":"STORAGE",  "id":22246,   "ctx":"initandlisten","msg":"Repairing catalog metadata"}
{"t":{"$date":"2021-05-17T18:47:10.907+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:_mdb_catalog"}}
{"t":{"$date":"2021-05-17T18:47:10.907+00:00"},"s":"I",  "c":"STORAGE",  "id":4366405, "ctx":"initandlisten","msg":"Modifying the table logging settings for all existing WiredTiger tables","attr":{"loggingEnabled":false,"repair":true,"hasPreviouslyIncompleteTableChecks":true}}
{"t":{"$date":"2021-05-17T18:47:10.911+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
{"t":{"$date":"2021-05-17T18:47:10.912+00:00"},"s":"W",  "c":"CONTROL",  "id":22178,   "ctx":"initandlisten","msg":"/sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'","tags":["startupWarnings"]}
{"t":{"$date":"2021-05-17T18:47:10.912+00:00"},"s":"W",  "c":"CONTROL",  "id":22181,   "ctx":"initandlisten","msg":"/sys/kernel/mm/transparent_hugepage/defrag is 'always'. We suggest setting it to 'never'","tags":["startupWarnings"]}
{"t":{"$date":"2021-05-17T18:47:10.913+00:00"},"s":"I",  "c":"STORAGE",  "id":21029,   "ctx":"initandlisten","msg":"repairDatabase","attr":{"db":"local"}}
{"t":{"$date":"2021-05-17T18:47:10.917+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.system.replset"}}
{"t":{"$date":"2021-05-17T18:47:10.920+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-9-4349981716926614396"}}
{"t":{"$date":"2021-05-17T18:47:10.921+00:00"},"s":"I",  "c":"INDEX",    "id":20295,   "ctx":"initandlisten","msg":"Validating internal structure","attr":{"index":"_id_","namespace":"local.system.replset"}}
{"t":{"$date":"2021-05-17T18:47:10.928+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.system.replset","uuid":" (UUID: 03a43e44-2bc6-4a01-8d47-71bb422a416c)"}}
{"t":{"$date":"2021-05-17T18:47:10.928+00:00"},"s":"I",  "c":"INDEX",    "id":20296,   "ctx":"initandlisten","msg":"Validating index consistency","attr":{"index":"_id_","namespace":"local.system.replset"}}
{"t":{"$date":"2021-05-17T18:47:10.928+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.system.replset","uuid":" (UUID: 03a43e44-2bc6-4a01-8d47-71bb422a416c)"}}
{"t":{"$date":"2021-05-17T18:47:10.928+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.system.replset","nInvalidDocuments":0,"nrecords":1,"nIndexes":1,"keysPerIndex":{"_id_":1},"indexDetails":{"_id_":{"valid":true}}}}}
{"t":{"$date":"2021-05-17T18:47:10.928+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.replset.oplogTruncateAfterPoint"}}
{"t":{"$date":"2021-05-17T18:47:10.930+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-0-4349981716926614396"}}
{"t":{"$date":"2021-05-17T18:47:10.931+00:00"},"s":"I",  "c":"INDEX",    "id":20295,   "ctx":"initandlisten","msg":"Validating internal structure","attr":{"index":"_id_","namespace":"local.replset.oplogTruncateAfterPoint"}}
{"t":{"$date":"2021-05-17T18:47:10.936+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.replset.oplogTruncateAfterPoint","uuid":" (UUID: 1914497e-02ac-44d5-941e-2ab67a885e8b)"}}
{"t":{"$date":"2021-05-17T18:47:10.936+00:00"},"s":"I",  "c":"INDEX",    "id":20296,   "ctx":"initandlisten","msg":"Validating index consistency","attr":{"index":"_id_","namespace":"local.replset.oplogTruncateAfterPoint"}}
{"t":{"$date":"2021-05-17T18:47:10.936+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.replset.oplogTruncateAfterPoint","uuid":" (UUID: 1914497e-02ac-44d5-941e-2ab67a885e8b)"}}
{"t":{"$date":"2021-05-17T18:47:10.936+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.replset.oplogTruncateAfterPoint","nInvalidDocuments":0,"nrecords":1,"nIndexes":1,"keysPerIndex":{"_id_":1},"indexDetails":{"_id_":{"valid":true}}}}}
{"t":{"$date":"2021-05-17T18:47:10.936+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.startup_log"}}
{"t":{"$date":"2021-05-17T18:47:10.938+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-2-4225397428487767846"}}
{"t":{"$date":"2021-05-17T18:47:10.939+00:00"},"s":"I",  "c":"INDEX",    "id":20295,   "ctx":"initandlisten","msg":"Validating internal structure","attr":{"index":"_id_","namespace":"local.startup_log"}}
{"t":{"$date":"2021-05-17T18:47:10.943+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.startup_log","uuid":" (UUID: 50fea54f-0be0-4304-908d-819ef7724b0a)"}}
{"t":{"$date":"2021-05-17T18:47:10.943+00:00"},"s":"I",  "c":"INDEX",    "id":20296,   "ctx":"initandlisten","msg":"Validating index consistency","attr":{"index":"_id_","namespace":"local.startup_log"}}
{"t":{"$date":"2021-05-17T18:47:10.943+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.startup_log","uuid":" (UUID: 50fea54f-0be0-4304-908d-819ef7724b0a)"}}
{"t":{"$date":"2021-05-17T18:47:10.943+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.startup_log","nInvalidDocuments":0,"nrecords":4,"nIndexes":1,"keysPerIndex":{"_id_":4},"indexDetails":{"_id_":{"valid":true}}}}}
{"t":{"$date":"2021-05-17T18:47:10.943+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.system.rollback.id"}}
{"t":{"$date":"2021-05-17T18:47:10.944+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-6-4349981716926614396"}}
{"t":{"$date":"2021-05-17T18:47:10.946+00:00"},"s":"I",  "c":"INDEX",    "id":20295,   "ctx":"initandlisten","msg":"Validating internal structure","attr":{"index":"_id_","namespace":"local.system.rollback.id"}}
{"t":{"$date":"2021-05-17T18:47:10.950+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.system.rollback.id","uuid":" (UUID: 81f053c3-2b39-49fa-b5d3-f7f6f510a3aa)"}}
{"t":{"$date":"2021-05-17T18:47:10.950+00:00"},"s":"I",  "c":"INDEX",    "id":20296,   "ctx":"initandlisten","msg":"Validating index consistency","attr":{"index":"_id_","namespace":"local.system.rollback.id"}}
{"t":{"$date":"2021-05-17T18:47:10.950+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.system.rollback.id","uuid":" (UUID: 81f053c3-2b39-49fa-b5d3-f7f6f510a3aa)"}}
{"t":{"$date":"2021-05-17T18:47:10.950+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.system.rollback.id","nInvalidDocuments":0,"nrecords":1,"nIndexes":1,"keysPerIndex":{"_id_":1},"indexDetails":{"_id_":{"valid":true}}}}}
{"t":{"$date":"2021-05-17T18:47:10.950+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.replset.initialSyncId"}}
{"t":{"$date":"2021-05-17T18:47:10.953+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-11-4349981716926614396"}}
{"t":{"$date":"2021-05-17T18:47:10.954+00:00"},"s":"I",  "c":"INDEX",    "id":20295,   "ctx":"initandlisten","msg":"Validating internal structure","attr":{"index":"_id_","namespace":"local.replset.initialSyncId"}}
{"t":{"$date":"2021-05-17T18:47:10.959+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.replset.initialSyncId","uuid":" (UUID: cb110eb2-713b-49fb-b720-57334832fbb5)"}}
{"t":{"$date":"2021-05-17T18:47:10.959+00:00"},"s":"I",  "c":"INDEX",    "id":20296,   "ctx":"initandlisten","msg":"Validating index consistency","attr":{"index":"_id_","namespace":"local.replset.initialSyncId"}}
{"t":{"$date":"2021-05-17T18:47:10.959+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.replset.initialSyncId","uuid":" (UUID: cb110eb2-713b-49fb-b720-57334832fbb5)"}}
{"t":{"$date":"2021-05-17T18:47:10.959+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.replset.initialSyncId","nInvalidDocuments":0,"nrecords":1,"nIndexes":1,"keysPerIndex":{"_id_":1},"indexDetails":{"_id_":{"valid":true}}}}}
{"t":{"$date":"2021-05-17T18:47:10.959+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.replset.minvalid"}}
{"t":{"$date":"2021-05-17T18:47:10.961+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-2-4349981716926614396"}}
{"t":{"$date":"2021-05-17T18:47:10.962+00:00"},"s":"I",  "c":"INDEX",    "id":20295,   "ctx":"initandlisten","msg":"Validating internal structure","attr":{"index":"_id_","namespace":"local.replset.minvalid"}}
{"t":{"$date":"2021-05-17T18:47:10.968+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.replset.minvalid","uuid":" (UUID: e7cc2c31-b3ef-4d74-8550-022cd8fd0c76)"}}
{"t":{"$date":"2021-05-17T18:47:10.968+00:00"},"s":"I",  "c":"INDEX",    "id":20296,   "ctx":"initandlisten","msg":"Validating index consistency","attr":{"index":"_id_","namespace":"local.replset.minvalid"}}
{"t":{"$date":"2021-05-17T18:47:10.968+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.replset.minvalid","uuid":" (UUID: e7cc2c31-b3ef-4d74-8550-022cd8fd0c76)"}}
{"t":{"$date":"2021-05-17T18:47:10.968+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.replset.minvalid","nInvalidDocuments":0,"nrecords":1,"nIndexes":1,"keysPerIndex":{"_id_":1},"indexDetails":{"_id_":{"valid":true}}}}}
{"t":{"$date":"2021-05-17T18:47:10.968+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.oplog.rs"}}
{"t":{"$date":"2021-05-17T18:47:10.994+00:00"},"s":"I",  "c":"STORAGE",  "id":22431,   "ctx":"initandlisten","msg":"WiredTiger progress","attr":{"operation":"WT_SESSION.verify","progress":100}}
{"t":{"$date":"2021-05-17T18:47:10.996+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-8-4349981716926614396"}}
{"t":{"$date":"2021-05-17T18:47:10.998+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.oplog.rs","uuid":" (UUID: ec02fa11-aec6-44e3-bf48-236c1c42d14f)"}}
{"t":{"$date":"2021-05-17T18:47:11.030+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.oplog.rs","uuid":" (UUID: ec02fa11-aec6-44e3-bf48-236c1c42d14f)"}}
{"t":{"$date":"2021-05-17T18:47:11.030+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.oplog.rs","nInvalidDocuments":0,"nrecords":92098,"nIndexes":0,"keysPerIndex":{},"indexDetails":{}}}}
{"t":{"$date":"2021-05-17T18:47:11.030+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"local.replset.election"}}
{"t":{"$date":"2021-05-17T18:47:11.033+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:local/collection-4-4349981716926614396"}}
{"t":{"$date":"2021-05-17T18:47:11.033+00:00"},"s":"I",  "c":"INDEX",    "id":20295,   "ctx":"initandlisten","msg":"Validating internal structure","attr":{"index":"_id_","namespace":"local.replset.election"}}
{"t":{"$date":"2021-05-17T18:47:11.037+00:00"},"s":"I",  "c":"INDEX",    "id":20303,   "ctx":"initandlisten","msg":"validating collection","attr":{"namespace":"local.replset.election","uuid":" (UUID: f281dc56-4c65-48ba-905d-0dc49277185a)"}}
{"t":{"$date":"2021-05-17T18:47:11.037+00:00"},"s":"I",  "c":"INDEX",    "id":20296,   "ctx":"initandlisten","msg":"Validating index consistency","attr":{"index":"_id_","namespace":"local.replset.election"}}
{"t":{"$date":"2021-05-17T18:47:11.038+00:00"},"s":"I",  "c":"INDEX",    "id":20306,   "ctx":"initandlisten","msg":"Validation complete for collection. No corruption found","attr":{"namespace":"local.replset.election","uuid":" (UUID: f281dc56-4c65-48ba-905d-0dc49277185a)"}}
{"t":{"$date":"2021-05-17T18:47:11.038+00:00"},"s":"I",  "c":"STORAGE",  "id":21028,   "ctx":"initandlisten","msg":"Collection validation","attr":{"results":{"ns":"local.replset.election","nInvalidDocuments":0,"nrecords":1,"nIndexes":1,"keysPerIndex":{"_id_":1},"indexDetails":{"_id_":{"valid":true}}}}}
{"t":{"$date":"2021-05-17T18:47:11.038+00:00"},"s":"I",  "c":"STORAGE",  "id":21029,   "ctx":"initandlisten","msg":"repairDatabase","attr":{"db":"config"}}
{"t":{"$date":"2021-05-17T18:47:11.041+00:00"},"s":"I",  "c":"STORAGE",  "id":21027,   "ctx":"initandlisten","msg":"Repairing collection","attr":{"namespace":"config.cache.collections"}}
{"t":{"$date":"2021-05-17T18:47:11.043+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:config/collection-6-5174843234450259109"}}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"E",  "c":"STORAGE",  "id":20557,   "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"NotPrimaryNoSecondaryOk: not master and slaveOk=false"}}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"REPL",     "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":10000}}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"COMMAND",  "id":4784901, "ctx":"initandlisten","msg":"Shutting down the MirrorMaestro"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"SHARDING", "id":4784902, "ctx":"initandlisten","msg":"Shutting down the WaitForMajorityService"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"NETWORK",  "id":4784905, "ctx":"initandlisten","msg":"Shutting down the global connection pool"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"STORAGE",  "id":4784906, "ctx":"initandlisten","msg":"Shutting down the FlowControlTicketholder"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"-",        "id":20520,   "ctx":"initandlisten","msg":"Stopping further Flow Control ticket acquisitions."}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"STORAGE",  "id":4784908, "ctx":"initandlisten","msg":"Shutting down the PeriodicThreadToAbortExpiredTransactions"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"STORAGE",  "id":4784934, "ctx":"initandlisten","msg":"Shutting down the PeriodicThreadToDecreaseSnapshotHistoryCachePressure"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"REPL",     "id":4784909, "ctx":"initandlisten","msg":"Shutting down the ReplicationCoordinator"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"REPL",     "id":21328,   "ctx":"initandlisten","msg":"Shutting down replication subsystems"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"W",  "c":"REPL",     "id":21409,   "ctx":"initandlisten","msg":"ReplicationCoordinatorImpl::shutdown() called before startup() finished. Shutting down without cleaning up the replication system"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"SHARDING", "id":4784910, "ctx":"initandlisten","msg":"Shutting down the ShardingInitializationMongoD"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"REPL",     "id":4784911, "ctx":"initandlisten","msg":"Enqueuing the ReplicationStateTransitionLock for shutdown"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"-",        "id":4784912, "ctx":"initandlisten","msg":"Killing all operations for shutdown"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"-",        "id":4695300, "ctx":"initandlisten","msg":"Interrupted all currently running operations","attr":{"opsKilled":3}}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"COMMAND",  "id":4784913, "ctx":"initandlisten","msg":"Shutting down all open transactions"}
{"t":{"$date":"2021-05-17T18:47:11.044+00:00"},"s":"I",  "c":"REPL",     "id":4784914, "ctx":"initandlisten","msg":"Acquiring the ReplicationStateTransitionLock for shutdown"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"INDEX",    "id":4784915, "ctx":"initandlisten","msg":"Shutting down the IndexBuildsCoordinator"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"REPL",     "id":4784916, "ctx":"initandlisten","msg":"Reacquiring the ReplicationStateTransitionLock for shutdown"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"REPL",     "id":4784917, "ctx":"initandlisten","msg":"Attempting to mark clean shutdown"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"NETWORK",  "id":4784918, "ctx":"initandlisten","msg":"Shutting down the ReplicaSetMonitor"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"SHARDING", "id":4784921, "ctx":"initandlisten","msg":"Shutting down the MigrationUtilExecutor"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"CONTROL",  "id":4784925, "ctx":"initandlisten","msg":"Shutting down free monitoring"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":4784927, "ctx":"initandlisten","msg":"Shutting down the HealthLog"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":4784929, "ctx":"initandlisten","msg":"Acquiring the global lock for shutdown"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":4784930, "ctx":"initandlisten","msg":"Shutting down the storage engine"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":20282,   "ctx":"initandlisten","msg":"Deregistering all the collections"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22372,   "ctx":"OplogVisibilityThread","msg":"Oplog visibility thread shutting down."}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22261,   "ctx":"initandlisten","msg":"Timestamp monitor shutting down"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22317,   "ctx":"initandlisten","msg":"WiredTigerKVEngine shutting down"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22318,   "ctx":"initandlisten","msg":"Shutting down session sweeper thread"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22319,   "ctx":"initandlisten","msg":"Finished shutting down session sweeper thread"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22320,   "ctx":"initandlisten","msg":"Shutting down journal flusher thread"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22321,   "ctx":"initandlisten","msg":"Finished shutting down journal flusher thread"}
{"t":{"$date":"2021-05-17T18:47:11.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22322,   "ctx":"initandlisten","msg":"Shutting down checkpoint thread"}
{"t":{"$date":"2021-05-17T18:47:11.046+00:00"},"s":"I",  "c":"STORAGE",  "id":22323,   "ctx":"initandlisten","msg":"Finished shutting down checkpoint thread"}
{"t":{"$date":"2021-05-17T18:47:11.046+00:00"},"s":"I",  "c":"STORAGE",  "id":22324,   "ctx":"initandlisten","msg":"Closing WiredTiger in preparation for reconfiguring","attr":{"closeConfig":"leak_memory=true,"}}
{"t":{"$date":"2021-05-17T18:47:11.080+00:00"},"s":"I",  "c":"STORAGE",  "id":4795905, "ctx":"initandlisten","msg":"WiredTiger closed","attr":{"durationMillis":34}}
{"t":{"$date":"2021-05-17T18:47:11.109+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277231:109691][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 39 through 40"}}
{"t":{"$date":"2021-05-17T18:47:11.154+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277231:154932][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 40 through 40"}}
{"t":{"$date":"2021-05-17T18:47:12.081+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277232:81452][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 39/1408 to 40/256"}}
{"t":{"$date":"2021-05-17T18:47:12.082+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277232:82810][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 39 through 40"}}
{"t":{"$date":"2021-05-17T18:47:12.168+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277232:168537][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 40 through 40"}}
{"t":{"$date":"2021-05-17T18:47:12.208+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277232:208950][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (1620472977, 1)"}}
{"t":{"$date":"2021-05-17T18:47:12.209+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1621277232:209009][816:0x7f44be02e140], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (1620472977, 1)"}}
{"t":{"$date":"2021-05-17T18:47:12.323+00:00"},"s":"I",  "c":"STORAGE",  "id":4795904, "ctx":"initandlisten","msg":"WiredTiger re-opened","attr":{"durationMillis":1243}}
{"t":{"$date":"2021-05-17T18:47:12.323+00:00"},"s":"I",  "c":"STORAGE",  "id":22325,   "ctx":"initandlisten","msg":"Reconfiguring","attr":{"newConfig":"compatibility=(release=3.3)"}}
{"t":{"$date":"2021-05-17T18:47:12.526+00:00"},"s":"I",  "c":"STORAGE",  "id":4795903, "ctx":"initandlisten","msg":"Reconfigure complete","attr":{"durationMillis":203}}
{"t":{"$date":"2021-05-17T18:47:12.526+00:00"},"s":"I",  "c":"STORAGE",  "id":4795902, "ctx":"initandlisten","msg":"Closing WiredTiger","attr":{"closeConfig":"leak_memory=true,"}}
{"t":{"$date":"2021-05-17T18:47:12.560+00:00"},"s":"I",  "c":"STORAGE",  "id":4795901, "ctx":"initandlisten","msg":"WiredTiger closed","attr":{"durationMillis":34}}
{"t":{"$date":"2021-05-17T18:47:12.560+00:00"},"s":"I",  "c":"STORAGE",  "id":22279,   "ctx":"initandlisten","msg":"shutdown: removing fs lock..."}
{"t":{"$date":"2021-05-17T18:47:12.560+00:00"},"s":"I",  "c":"-",        "id":4784931, "ctx":"initandlisten","msg":"Dropping the scope cache for shutdown"}
{"t":{"$date":"2021-05-17T18:47:12.560+00:00"},"s":"I",  "c":"FTDC",     "id":4784926, "ctx":"initandlisten","msg":"Shutting down full-time data capture"}
{"t":{"$date":"2021-05-17T18:47:12.560+00:00"},"s":"I",  "c":"CONTROL",  "id":20565,   "ctx":"initandlisten","msg":"Now exiting"}
{"t":{"$date":"2021-05-17T18:47:12.560+00:00"},"s":"I",  "c":"CONTROL",  "id":23138,   "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":100}}
miguelaeh commented 3 years ago

Hi @francescocambi , I can see only this error on the log:

DBException in initAndListen, terminating","attr":{"error":"NotPrimaryNoSecondaryOk: not master and slaveOk=false"}}

Which says the rest of the nodes are not ok but not the reason for that. Does it return anything from

rs.status();

?

francescocambi commented 3 years ago

Hi @miguelaeh, here is the output of the rs.status() command from the "mongos" instance:

connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("2dc46f27-8a60-4684-8c27-49a3a1801e71") }
MongoDB server version: 4.4.4
mongos> rs.status()
{
        "info" : "mongos",
        "ok" : 0,
        "errmsg" : "replSetGetStatus is not supported through mongos",
        "operationTime" : Timestamp(1621357509, 2),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1621357509, 2),
                "signature" : {
                        "hash" : BinData(0,"prNXaygAlgcuRkprEPYQqFJTrpY="),
                        "keyId" : NumberLong("6956340087294525462")
                }
        }
}

and here is the output from the configsvr instance:

mdb01-dev-mongodb-sharded-configsvr:PRIMARY> rs.status()
{
        "set" : "mdb01-dev-mongodb-sharded-configsvr",
        "date" : ISODate("2021-05-18T17:06:39.441Z"),
        "myState" : 1,
        "term" : NumberLong(7),
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "configsvr" : true,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "majorityVoteCount" : 1,
        "writeMajorityCount" : 1,
        "votingMembersCount" : 1,
        "writableVotingMembersCount" : 1,
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1621357598, 1),
                        "t" : NumberLong(7)
                },
                "lastCommittedWallTime" : ISODate("2021-05-18T17:06:38.546Z"),
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1621357598, 1),
                        "t" : NumberLong(7)
                },
                "readConcernMajorityWallTime" : ISODate("2021-05-18T17:06:38.546Z"),
                "appliedOpTime" : {
                        "ts" : Timestamp(1621357598, 1),
                        "t" : NumberLong(7)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1621357598, 1),
                        "t" : NumberLong(7)
                },
                "lastAppliedWallTime" : ISODate("2021-05-18T17:06:38.546Z"),
                "lastDurableWallTime" : ISODate("2021-05-18T17:06:38.546Z")
        },
        "lastStableRecoveryTimestamp" : Timestamp(1621357568, 1),
        "electionCandidateMetrics" : {
                "lastElectionReason" : "electionTimeout",
                "lastElectionDate" : ISODate("2021-05-17T17:19:29.090Z"),
                "electionTerm" : NumberLong(7),
                "lastCommittedOpTimeAtElection" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "lastSeenOpTimeAtElection" : {
                        "ts" : Timestamp(1621271925, 1),
                        "t" : NumberLong(6)
                },
                "numVotesNeeded" : 1,
                "priorityAtElection" : 5,
                "electionTimeoutMillis" : NumberLong(10000),
                "newTermStartDate" : ISODate("2021-05-17T17:19:29.092Z"),
                "wMajorityWriteAvailabilityDate" : ISODate("2021-05-17T17:19:29.094Z")
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "mdb01-dev-mongodb-sharded-configsvr-0.mdb01-dev-mongodb-sharded-headless.development.svc.cluster.local:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 85633,
                        "optime" : {
                                "ts" : Timestamp(1621357598, 1),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2021-05-18T17:06:38Z"),
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1621271969, 1),
                        "electionDate" : ISODate("2021-05-17T17:19:29Z"),
                        "configVersion" : 2,
                        "configTerm" : 7,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "$gleStats" : {
                "lastOpTime" : Timestamp(0, 0),
                "electionId" : ObjectId("7fffffff0000000000000007")
        },
        "lastCommittedOpTime" : Timestamp(1621357598, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1621357598, 1),
                "signature" : {
                        "hash" : BinData(0,"o2DKOQwYwx8ufdrr1MTmkDC1H8w="),
                        "keyId" : NumberLong("6956340087294525462")
                }
        },
        "operationTime" : Timestamp(1621357598, 1)
}

I didn't find a way to start a mongo console on the shard pods, because they keep crashing. Tell me if you know a way to run the command on them.

miguelaeh commented 3 years ago

It seems that node is propertyl configured as primary:

"stateStr" : "PRIMARY",

So, I think we would need to ensure the shards have that primary configured which address should be mdb01-dev-mongodb-sharded-configsvr-0.mdb01-dev-mongodb-sharded-headless.development.svc.cluster.local:27017. I think that can be checked to get the value of MONGODB_INITIAL_PRIMARY_HOST in the shards. Something like

kubectl exec -it <pod_name> -- env | grep MONGODB_INITIAL_PRIMARY_HOST

should be enough in the shards. You will need to do it before it is restarted again.

francescocambi commented 3 years ago

Hi @miguelaeh , here is the output of the command ‘env | grep MONGODB_INITIAL_PRIMARY_HOST‘ run inside the pod.

mdb01-dev-mongodb-sharded-shard0-data-0:/$ env | grep MONGODB_INITIAL_PRIMARY_HOST
MONGODB_INITIAL_PRIMARY_HOST=mdb01-dev-mongodb-sharded-shard0-data-0.mdb01-dev-mongodb-sharded-headless.development.svc.cluster.local

The address seems correct.

miguelaeh commented 3 years ago

Hi @francescocambi , Honestly, I have not much more idea of what to check to get information. The error code 51 seems to be Level 2 halted, which means that your level 2 connection could not be properly set up. If you can't find the solution I would recommend to try to restore the cluster in a new one, for example backing up the volumes using velero to deploy them with a new MongoDB cluster. Probably in that case you will need to delete or edit the information about the rest of nodes

ninja- commented 3 years ago

hmm could it be the case that the healthchecks are killing all cluster nodes before they are able to recover and reach quorum? :thinking:

github-actions[bot] commented 3 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 3 years ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.