autopilotpattern / mongodb

A robust and highly-scalable implementation of MongoDB in Docker using the Autopilot Pattern
Mozilla Public License 2.0
42 stars 20 forks source link

Scaling cluster to zero #5

Open tianon opened 7 years ago

tianon commented 7 years ago

Currently, if one scales this image down to zero nodes of mongodb (the goal being to reset the cluster), Consul keeps a cache of who the primary is supposed to be, so the nodes that get added back don't create a new cluster.

Not sure of a good solution, but wanted to at least document the problem. :+1:

leoj3n commented 5 years ago

Scaling to 0 then back to 3 seems to behave as you have described here; a new cluster doesn't form.

When I scale up to 3 (on local docker), the PRIMARY is often the second container instance. When I scale down to 1, the second and third instances are stopped and removed. The second instance does not seem to properly elect the first instance as the new primary.

I think these are related. Although, reading the manage.py it looks like scaling to 1 should elect a new primary; perhaps it's electing the third instance which is also coming down and it just doesn't know it.

Have you thought of any possible solutions since creating this issue?

The log output on the single remaining SECONDARY instance when scaling to 1 says:

6338257Z 2019-04-26T07:02:48.633+0000 I REPL     [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1)

Log output from last healthy response up to the repeating Not starting an election message:

```console 2019-04-26T07:02:25.8592537Z check.mongodb-replicaset.Run start 2019-04-26T07:02:26.1713235Z 2019-04-26 07:02:26,170 DEBUG manage.py [health] health 2019-04-26T07:02:26.1747423Z 2019-04-26T07:02:26.174+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49434 #393 (6 connections now open) 2019-04-26T07:02:26.1765195Z 2019-04-26T07:02:26.175+0000 I NETWORK [conn393] received client metadata from 172.27.0.5:49434 conn393: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:26.1782216Z 2019-04-26T07:02:26.177+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49436 #394 (7 connections now open) 2019-04-26T07:02:26.1789038Z 2019-04-26T07:02:26.178+0000 I NETWORK [conn394] received client metadata from 172.27.0.5:49436 conn394: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:26.1873553Z 2019-04-26 07:02:26,186 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:02:26.192239Z 2019-04-26 07:02:26,191 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:02:26.1945056Z 2019-04-26 07:02:26,194 DEBUG manage.py [health] health: True 2019-04-26T07:02:26.194904Z 2019-04-26T07:02:26.194+0000 I - [conn393] end connection 172.27.0.5:49434 (7 connections now open) 2019-04-26T07:02:26.1950266Z 2019-04-26T07:02:26.194+0000 I - [conn394] end connection 172.27.0.5:49436 (7 connections now open) 2019-04-26T07:02:26.7490096Z check.mongodb-replicaset exited without error 2019-04-26T07:02:26.7492424Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:02:26.7493706Z check.mongodb-replicaset.Run end 2019-04-26T07:02:26.7494872Z check.mongodb-replicaset.term 2019-04-26T07:02:26.7496459Z terminating command 'check.mongodb-replicaset' at pid: 1844 2019-04-26T07:02:26.7497632Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:02:35.8587541Z check.mongodb-replicaset.Run start 2019-04-26T07:02:35.9020348Z 2019-04-26T07:02:35.901+0000 I - [conn17] end connection 172.27.0.7:48360 (5 connections now open) 2019-04-26T07:02:36.2528009Z 2019-04-26 07:02:36,252 DEBUG manage.py [health] health 2019-04-26T07:02:36.2595363Z 2019-04-26T07:02:36.259+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49474 #395 (5 connections now open) 2019-04-26T07:02:36.259623Z 2019-04-26T07:02:36.259+0000 I NETWORK [conn395] received client metadata from 172.27.0.5:49474 conn395: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:36.261452Z 2019-04-26T07:02:36.261+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49476 #396 (6 connections now open) 2019-04-26T07:02:36.2620788Z 2019-04-26T07:02:36.261+0000 I NETWORK [conn396] received client metadata from 172.27.0.5:49476 conn396: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:36.265921Z 2019-04-26 07:02:36,265 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:02:36.2760926Z 2019-04-26 07:02:36,271 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:02:36.2828706Z 2019-04-26 07:02:36,281 DEBUG manage.py [health] health: True 2019-04-26T07:02:36.2837786Z 2019-04-26T07:02:36.281+0000 I - [conn396] end connection 172.27.0.5:49476 (6 connections now open) 2019-04-26T07:02:36.285811Z 2019-04-26T07:02:36.285+0000 I - [conn395] end connection 172.27.0.5:49474 (5 connections now open) 2019-04-26T07:02:36.8292281Z check.mongodb-replicaset exited without error 2019-04-26T07:02:36.8294421Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:02:36.8298386Z check.mongodb-replicaset.Run end 2019-04-26T07:02:36.8303299Z check.mongodb-replicaset.term 2019-04-26T07:02:36.8305682Z terminating command 'check.mongodb-replicaset' at pid: 1853 2019-04-26T07:02:36.831209Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:02:38.2400701Z 2019-04-26T07:02:38.239+0000 I - [conn12] end connection 172.27.0.7:48236 (4 connections now open) 2019-04-26T07:02:38.240215Z 2019-04-26T07:02:38.239+0000 I - [conn116] end connection 172.27.0.7:50186 (3 connections now open) 2019-04-26T07:02:38.6564986Z 2019-04-26T07:02:38.656+0000 I - [conn10] end connection 172.27.0.6:49846 (2 connections now open) 2019-04-26T07:02:38.6573375Z 2019-04-26T07:02:38.657+0000 I ASIO [NetworkInterfaceASIO-RS-0] Ending connection to host 172.27.0.6:27017 due to bad connection status; 2 connections to that host remain open 2019-04-26T07:02:38.6578909Z 2019-04-26T07:02:38.657+0000 I REPL [replication-0] Restarting oplog query due to error: HostUnreachable: End of file. Last fetched optime (with hash): { ts: Timestamp 1556262150000|1, t: 1 }[-4827456548652813520]. Restarts remaining: 1 2019-04-26T07:02:38.6579734Z 2019-04-26T07:02:38.657+0000 I ASIO [replication-0] dropping unhealthy pooled connection to 172.27.0.6:27017 2019-04-26T07:02:38.6582843Z 2019-04-26T07:02:38.658+0000 I ASIO [replication-0] dropping unhealthy pooled connection to 172.27.0.6:27017 2019-04-26T07:02:38.658349Z 2019-04-26T07:02:38.658+0000 I ASIO [replication-0] after drop, pool was empty, going to spawn some connections 2019-04-26T07:02:38.6584417Z 2019-04-26T07:02:38.658+0000 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to 172.27.0.6:27017 2019-04-26T07:02:38.6585597Z 2019-04-26T07:02:38.658+0000 I REPL [replication-0] Scheduled new oplog query Fetcher source: 172.27.0.6:27017 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1556262150000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, term: 1, readConcern: { afterOpTime: { ts: Timestamp 1556262150000|1, t: 1 } } } query metadata: { $replData: 1, $oplogQueryData: 1, $ssm: { $secondaryOk: true } } active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 6853 -- target:172.27.0.6:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1556262150000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, term: 1, readConcern: { afterOpTime: { ts: Timestamp 1556262150000|1, t: 1 } } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms 2019-04-26T07:02:38.6594002Z 2019-04-26T07:02:38.659+0000 I ASIO [NetworkInterfaceASIO-RS-0] Failed to connect to 172.27.0.6:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.6595207Z 2019-04-26T07:02:38.659+0000 I ASIO [NetworkInterfaceASIO-RS-0] Dropping all pooled connections to 172.27.0.6:27017 due to failed operation on a connection 2019-04-26T07:02:38.6596238Z 2019-04-26T07:02:38.659+0000 I REPL [replication-1] Error returned from oplog query (no more query restarts left): HostUnreachable: Connection refused 2019-04-26T07:02:38.6603501Z 2019-04-26T07:02:38.659+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: HostUnreachable: Connection refused 2019-04-26T07:02:38.6605174Z 2019-04-26T07:02:38.660+0000 I REPL [rsBackgroundSync] could not find member to sync from 2019-04-26T07:02:38.6606194Z 2019-04-26T07:02:38.660+0000 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to 172.27.0.6:27017 2019-04-26T07:02:38.6609724Z 2019-04-26T07:02:38.660+0000 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections 2019-04-26T07:02:38.6611063Z 2019-04-26T07:02:38.660+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.6:27017 2019-04-26T07:02:38.6617182Z 2019-04-26T07:02:38.661+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.6:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.66186Z 2019-04-26T07:02:38.661+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.6:27017 due to failed operation on a connection 2019-04-26T07:02:38.6619501Z 2019-04-26T07:02:38.661+0000 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to 172.27.0.7:27017 2019-04-26T07:02:38.6620806Z 2019-04-26T07:02:38.661+0000 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections 2019-04-26T07:02:38.6621522Z 2019-04-26T07:02:38.661+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.6:27017; HostUnreachable: Connection refused 2019-04-26T07:02:38.6622514Z 2019-04-26T07:02:38.662+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:02:38.6633555Z 2019-04-26T07:02:38.662+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.7:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.6634772Z 2019-04-26T07:02:38.662+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.7:27017 due to failed operation on a connection 2019-04-26T07:02:38.6639626Z 2019-04-26T07:02:38.662+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.6:27017 2019-04-26T07:02:38.6649743Z 2019-04-26T07:02:38.663+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; HostUnreachable: Connection refused 2019-04-26T07:02:38.6651105Z 2019-04-26T07:02:38.663+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:02:38.6658806Z 2019-04-26T07:02:38.663+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.6:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.6660486Z 2019-04-26T07:02:38.663+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.6:27017 due to failed operation on a connection 2019-04-26T07:02:38.6666974Z 2019-04-26T07:02:38.663+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.7:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.6676172Z 2019-04-26T07:02:38.663+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.7:27017 due to failed operation on a connection 2019-04-26T07:02:38.6680372Z 2019-04-26T07:02:38.663+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.6:27017; HostUnreachable: Connection refused 2019-04-26T07:02:38.669124Z 2019-04-26T07:02:38.663+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; HostUnreachable: Connection refused 2019-04-26T07:02:38.6693413Z 2019-04-26T07:02:38.664+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.6:27017 2019-04-26T07:02:38.6702045Z 2019-04-26T07:02:38.664+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:02:38.6708794Z 2019-04-26T07:02:38.664+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.6:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.6717844Z 2019-04-26T07:02:38.664+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.6:27017 due to failed operation on a connection 2019-04-26T07:02:38.6719527Z 2019-04-26T07:02:38.665+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.6:27017; HostUnreachable: Connection refused 2019-04-26T07:02:38.6720561Z 2019-04-26T07:02:38.665+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.7:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.6725474Z 2019-04-26T07:02:38.665+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.7:27017 due to failed operation on a connection 2019-04-26T07:02:38.6727308Z 2019-04-26T07:02:38.665+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; HostUnreachable: Connection refused 2019-04-26T07:02:39.8790016Z event: {StatusChanged watch.mongodb-replicaset} 2019-04-26T07:02:39.8793573Z event: {StatusHealthy watch.mongodb-replicaset} 2019-04-26T07:02:39.8796546Z onchange-mongodb-replicaset.Run start 2019-04-26T07:02:40.1650176Z 2019-04-26 07:02:40,164 DEBUG manage.py [on_change] on_change 2019-04-26T07:02:40.1687601Z 2019-04-26T07:02:40.168+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49510 #397 (2 connections now open) 2019-04-26T07:02:40.1694866Z 2019-04-26T07:02:40.168+0000 I NETWORK [conn397] received client metadata from 172.27.0.5:49510 conn397: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:40.1710675Z 2019-04-26T07:02:40.170+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49512 #398 (3 connections now open) 2019-04-26T07:02:40.1717389Z 2019-04-26T07:02:40.171+0000 I NETWORK [conn398] received client metadata from 172.27.0.5:49512 conn398: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:40.17298Z 2019-04-26T07:02:40.172+0000 I - [conn397] end connection 172.27.0.5:49510 (3 connections now open) 2019-04-26T07:02:40.1736466Z 2019-04-26T07:02:40.172+0000 I - [conn398] end connection 172.27.0.5:49512 (2 connections now open) 2019-04-26T07:02:40.1744213Z 2019-04-26 07:02:40,172 DEBUG manage.py [on_change] on_change: True 2019-04-26T07:02:40.7228011Z onchange-mongodb-replicaset exited without error 2019-04-26T07:02:40.7229314Z event: {ExitSuccess onchange-mongodb-replicaset} 2019-04-26T07:02:40.7231022Z onchange-mongodb-replicaset.Run end 2019-04-26T07:02:40.7231981Z onchange-mongodb-replicaset.term 2019-04-26T07:02:40.7232728Z terminating command 'onchange-mongodb-replicaset' at pid: 1862 2019-04-26T07:02:41.0345535Z 2019-04-26T07:02:41.034+0000 I - [conn382] end connection 172.27.0.7:55116 (1 connection now open) 2019-04-26T07:02:41.1280119Z 2019-04-26T07:02:41.127+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to 172.27.0.6:27017: InvalidSyncSource: Sync source was cleared. Was 172.27.0.6:27017 2019-04-26T07:02:43.6668292Z 2019-04-26T07:02:43.666+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.6:27017 2019-04-26T07:02:43.6669316Z 2019-04-26T07:02:43.666+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:02:45.8589331Z check.mongodb-replicaset.Run start 2019-04-26T07:02:46.2001248Z 2019-04-26 07:02:46,199 DEBUG manage.py [health] health 2019-04-26T07:02:46.2039286Z 2019-04-26T07:02:46.203+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49530 #399 (1 connection now open) 2019-04-26T07:02:46.2051493Z 2019-04-26T07:02:46.205+0000 I NETWORK [conn399] received client metadata from 172.27.0.5:49530 conn399: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:46.2076846Z 2019-04-26T07:02:46.207+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49532 #400 (2 connections now open) 2019-04-26T07:02:46.2085278Z 2019-04-26T07:02:46.208+0000 I NETWORK [conn400] received client metadata from 172.27.0.5:49532 conn400: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:46.2124907Z 2019-04-26 07:02:46,212 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:02:46.2186436Z 2019-04-26 07:02:46,218 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:02:46.2224099Z 2019-04-26T07:02:46.222+0000 I - [conn400] end connection 172.27.0.5:49532 (2 connections now open) 2019-04-26T07:02:46.2226546Z 2019-04-26 07:02:46,222 DEBUG manage.py [health] health: True 2019-04-26T07:02:46.2228526Z 2019-04-26T07:02:46.222+0000 I - [conn399] end connection 172.27.0.5:49530 (2 connections now open) 2019-04-26T07:02:46.7702834Z check.mongodb-replicaset exited without error 2019-04-26T07:02:46.7704199Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:02:46.7706821Z check.mongodb-replicaset.Run end 2019-04-26T07:02:46.7708716Z check.mongodb-replicaset.term 2019-04-26T07:02:46.7712099Z terminating command 'check.mongodb-replicaset' at pid: 1871 2019-04-26T07:02:46.7717361Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:02:48.6338257Z 2019-04-26T07:02:48.633+0000 I REPL [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1) 2019-04-26T07:02:53.6699817Z 2019-04-26T07:02:53.669+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.6:27017; NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit 2019-04-26T07:02:53.6700935Z 2019-04-26T07:02:53.669+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit 2019-04-26T07:02:55.8226551Z check.mongodb-replicaset.Run start 2019-04-26T07:02:56.1374432Z 2019-04-26 07:02:56,137 DEBUG manage.py [health] health 2019-04-26T07:02:56.1403759Z 2019-04-26T07:02:56.140+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49554 #401 (1 connection now open) 2019-04-26T07:02:56.1414227Z 2019-04-26T07:02:56.141+0000 I NETWORK [conn401] received client metadata from 172.27.0.5:49554 conn401: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:56.143823Z 2019-04-26T07:02:56.143+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49556 #402 (2 connections now open) 2019-04-26T07:02:56.1445438Z 2019-04-26T07:02:56.144+0000 I NETWORK [conn402] received client metadata from 172.27.0.5:49556 conn402: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:56.148149Z 2019-04-26 07:02:56,147 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:02:56.1583728Z 2019-04-26 07:02:56,157 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:02:56.1655465Z 2019-04-26T07:02:56.164+0000 I - [conn402] end connection 172.27.0.5:49556 (2 connections now open) 2019-04-26T07:02:56.1660359Z 2019-04-26T07:02:56.165+0000 I - [conn401] end connection 172.27.0.5:49554 (2 connections now open) 2019-04-26T07:02:56.1663434Z 2019-04-26 07:02:56,164 DEBUG manage.py [health] health: True 2019-04-26T07:02:56.6783162Z check.mongodb-replicaset exited without error 2019-04-26T07:02:56.6785167Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:02:56.6785885Z check.mongodb-replicaset.Run end 2019-04-26T07:02:56.678684Z check.mongodb-replicaset.term 2019-04-26T07:02:56.6787525Z terminating command 'check.mongodb-replicaset' at pid: 1880 2019-04-26T07:02:56.6788303Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:02:58.8398066Z 2019-04-26T07:02:58.839+0000 I REPL [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1) 2019-04-26T07:03:03.6684375Z 2019-04-26T07:03:03.667+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.6:27017 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 6868 -- target:172.27.0.6:27017 db:admin cmd:{ isMaster: 1 } 2019-04-26T07:03:03.6685859Z 2019-04-26T07:03:03.667+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.6:27017 2019-04-26T07:03:03.6686757Z 2019-04-26T07:03:03.668+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.7:27017 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 6870 -- target:172.27.0.7:27017 db:admin cmd:{ isMaster: 1 } 2019-04-26T07:03:03.668775Z 2019-04-26T07:03:03.668+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:03:05.8217702Z check.mongodb-replicaset.Run start 2019-04-26T07:03:06.1028266Z 2019-04-26 07:03:06,102 DEBUG manage.py [health] health 2019-04-26T07:03:06.111561Z 2019-04-26T07:03:06.110+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49586 #403 (1 connection now open) 2019-04-26T07:03:06.1195889Z 2019-04-26T07:03:06.118+0000 I NETWORK [conn403] received client metadata from 172.27.0.5:49586 conn403: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:03:06.1332708Z 2019-04-26T07:03:06.132+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49588 #404 (2 connections now open) 2019-04-26T07:03:06.1348992Z 2019-04-26T07:03:06.134+0000 I NETWORK [conn404] received client metadata from 172.27.0.5:49588 conn404: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:03:06.1403635Z 2019-04-26 07:03:06,139 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:03:06.1461137Z 2019-04-26 07:03:06,144 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:03:06.1501934Z 2019-04-26T07:03:06.149+0000 I - [conn404] end connection 172.27.0.5:49588 (2 connections now open) 2019-04-26T07:03:06.1524726Z 2019-04-26 07:03:06,150 DEBUG manage.py [health] health: True 2019-04-26T07:03:06.1527481Z 2019-04-26T07:03:06.152+0000 I - [conn403] end connection 172.27.0.5:49586 (1 connection now open) 2019-04-26T07:03:06.6498539Z check.mongodb-replicaset exited without error 2019-04-26T07:03:06.650002Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:03:06.6500926Z check.mongodb-replicaset.Run end 2019-04-26T07:03:06.6503318Z check.mongodb-replicaset.term 2019-04-26T07:03:06.650405Z terminating command 'check.mongodb-replicaset' at pid: 1889 2019-04-26T07:03:06.6509124Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:03:08.6721706Z 2019-04-26T07:03:08.671+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.6:27017; NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit 2019-04-26T07:03:08.6731724Z 2019-04-26T07:03:08.672+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit 2019-04-26T07:03:09.9596917Z 2019-04-26T07:03:09.959+0000 I REPL [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1) 2019-04-26T07:03:15.8220013Z check.mongodb-replicaset.Run start 2019-04-26T07:03:16.1457318Z 2019-04-26 07:03:16,145 DEBUG manage.py [health] health 2019-04-26T07:03:16.1497788Z 2019-04-26T07:03:16.149+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49610 #405 (1 connection now open) 2019-04-26T07:03:16.1526077Z 2019-04-26T07:03:16.150+0000 I NETWORK [conn405] received client metadata from 172.27.0.5:49610 conn405: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:03:16.1546338Z 2019-04-26T07:03:16.153+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:49612 #406 (2 connections now open) 2019-04-26T07:03:16.1567851Z 2019-04-26T07:03:16.156+0000 I NETWORK [conn406] received client metadata from 172.27.0.5:49612 conn406: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:03:16.1608904Z 2019-04-26 07:03:16,160 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:03:16.1654179Z 2019-04-26 07:03:16,165 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:03:16.1717522Z 2019-04-26 07:03:16,171 DEBUG manage.py [health] health: True 2019-04-26T07:03:16.1720277Z 2019-04-26T07:03:16.171+0000 I - [conn406] end connection 172.27.0.5:49612 (2 connections now open) 2019-04-26T07:03:16.1723062Z 2019-04-26T07:03:16.171+0000 I - [conn405] end connection 172.27.0.5:49610 (2 connections now open) 2019-04-26T07:03:16.7165807Z check.mongodb-replicaset exited without error 2019-04-26T07:03:16.7167098Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:03:16.7168286Z check.mongodb-replicaset.Run end 2019-04-26T07:03:16.7169596Z check.mongodb-replicaset.term 2019-04-26T07:03:16.7170342Z terminating command 'check.mongodb-replicaset' at pid: 1898 2019-04-26T07:03:16.7171758Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:03:20.978763Z 2019-04-26T07:03:20.978+0000 I REPL [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1) 2019-04-26T07:03:23.6698397Z 2019-04-26T07:03:23.669+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.6:27017 - NetworkInterfaceExceededTimeLimit: Operation timed out, request was RemoteCommand 6873 -- target:172.27.0.6:27017 db:admin cmd:{ isMaster: 1 } ```

Log output of PRIMARY from last healthy response to graceful shut down:

```console 2019-04-26T07:02:35.7992486Z check.mongodb-replicaset.Run start 2019-04-26T07:02:36.1735882Z 2019-04-26 07:02:36,172 DEBUG manage.py [health] health 2019-04-26T07:02:36.1908381Z 2019-04-26T07:02:36.190+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:43944 #404 (6 connections now open) 2019-04-26T07:02:36.1936309Z 2019-04-26T07:02:36.193+0000 I NETWORK [conn404] received client metadata from 172.27.0.6:43944 conn404: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:36.2034715Z 2019-04-26T07:02:36.203+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:43946 #405 (7 connections now open) 2019-04-26T07:02:36.2049787Z 2019-04-26T07:02:36.204+0000 I NETWORK [conn405] received client metadata from 172.27.0.6:43946 conn405: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:02:36.2099527Z 2019-04-26 07:02:36,209 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:02:36.2180479Z 2019-04-26 07:02:36,216 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/a20eeb2c-a00f-f5be-d960-8bf8908bafa3 HTTP/1.1" 200 213 2019-04-26T07:02:36.224287Z 2019-04-26T07:02:36.224+0000 I - [conn404] end connection 172.27.0.6:43944 (7 connections now open) 2019-04-26T07:02:36.2244239Z 2019-04-26T07:02:36.224+0000 I - [conn405] end connection 172.27.0.6:43946 (6 connections now open) 2019-04-26T07:02:36.2249545Z 2019-04-26 07:02:36,224 DEBUG manage.py [health] health: True 2019-04-26T07:02:36.7226096Z check.mongodb-replicaset exited without error 2019-04-26T07:02:36.7227827Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:02:36.7228864Z check.mongodb-replicaset.Run end 2019-04-26T07:02:36.7230574Z check.mongodb-replicaset.term 2019-04-26T07:02:36.723124Z terminating command 'check.mongodb-replicaset' at pid: 1848 2019-04-26T07:02:36.7232113Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:02:38.1296192Z event: {Shutdown global} 2019-04-26T07:02:38.1298435Z event: {Stopping mongodb-replicaset} 2019-04-26T07:02:38.1299923Z deregistering: mongodb-replicaset-2a88b241e7fb 2019-04-26T07:02:38.1302414Z event: {Stopping onchange-mongodb-replicaset} 2019-04-26T07:02:38.1303388Z event: {Stopped onchange-mongodb-replicaset} 2019-04-26T07:02:38.1304674Z mongodb-replicaset.term 2019-04-26T07:02:38.1305581Z terminating command 'mongodb-replicaset' at pid: 28 2019-04-26T07:02:38.1318602Z 2019-04-26T07:02:38.131+0000 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends 2019-04-26T07:02:38.1319408Z 2019-04-26T07:02:38.131+0000 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets... 2019-04-26T07:02:38.1320052Z 2019-04-26T07:02:38.131+0000 I NETWORK [signalProcessingThread] closing listening socket: 6 2019-04-26T07:02:38.1320983Z 2019-04-26T07:02:38.131+0000 I NETWORK [signalProcessingThread] closing listening socket: 7 2019-04-26T07:02:38.1321761Z 2019-04-26T07:02:38.131+0000 I NETWORK [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock 2019-04-26T07:02:38.1322576Z 2019-04-26T07:02:38.131+0000 I NETWORK [signalProcessingThread] shutdown: going to flush diaglog... 2019-04-26T07:02:38.1323313Z 2019-04-26T07:02:38.131+0000 I REPL [signalProcessingThread] shutting down replication subsystems 2019-04-26T07:02:38.1324221Z 2019-04-26T07:02:38.131+0000 I REPL [signalProcessingThread] Stopping replication reporter thread 2019-04-26T07:02:38.1339077Z 2019-04-26T07:02:38.133+0000 I REPL [signalProcessingThread] Stopping replication fetcher thread 2019-04-26T07:02:38.1339941Z 2019-04-26T07:02:38.133+0000 I REPL [signalProcessingThread] Stopping replication applier thread 2019-04-26T07:02:38.1691135Z event: {Stopped mongodb-replicaset} 2019-04-26T07:02:38.1698544Z control: stopping control server 2019-04-26T07:02:38.1704155Z control: completed graceful shutdown of control server 2019-04-26T07:02:38.171067Z killing all processes in 5 seconds 2019-04-26T07:02:38.171416Z control: stopped serving at /var/run/containerpilot.socket 2019-04-26T07:02:38.2410226Z 2019-04-26T07:02:38.239+0000 I - [conn7] end connection 172.27.0.7:58938 (5 connections now open) 2019-04-26T07:02:38.2414743Z 2019-04-26T07:02:38.239+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host 172.27.0.7:27017 due to bad connection status; 0 connections to that host remain open 2019-04-26T07:02:38.2415507Z 2019-04-26T07:02:38.239+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to close stream: Transport endpoint is not connected 2019-04-26T07:02:38.2416574Z 2019-04-26T07:02:38.239+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; HostUnreachable: Connection reset by peer 2019-04-26T07:02:38.2417607Z 2019-04-26T07:02:38.240+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:02:38.2418456Z 2019-04-26T07:02:38.241+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.7:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.242334Z 2019-04-26T07:02:38.241+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.7:27017 due to failed operation on a connection 2019-04-26T07:02:38.2424095Z 2019-04-26T07:02:38.241+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; HostUnreachable: Connection refused 2019-04-26T07:02:38.2425074Z 2019-04-26T07:02:38.241+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:02:38.2439885Z 2019-04-26T07:02:38.242+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.7:27017 - HostUnreachable: Connection refused 2019-04-26T07:02:38.244895Z 2019-04-26T07:02:38.242+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.7:27017 due to failed operation on a connection 2019-04-26T07:02:38.2450094Z 2019-04-26T07:02:38.242+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; HostUnreachable: Connection refused 2019-04-26T07:02:38.5135377Z 2019-04-26T07:02:38.513+0000 I REPL [signalProcessingThread] Stopping replication storage threads 2019-04-26T07:02:38.5158415Z 2019-04-26T07:02:38.515+0000 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture 2019-04-26T07:02:38.5198673Z 2019-04-26T07:02:38.519+0000 I STORAGE [signalProcessingThread] WiredTigerKVEngine shutting down 2019-04-26T07:02:38.6176037Z 2019-04-26T07:02:38.617+0000 I STORAGE [signalProcessingThread] shutdown: removing fs lock... 2019-04-26T07:02:38.6200869Z 2019-04-26T07:02:38.619+0000 I CONTROL [signalProcessingThread] now exiting 2019-04-26T07:02:38.6211506Z 2019-04-26T07:02:38.620+0000 I CONTROL [signalProcessingThread] shutting down with code:0 2019-04-26T07:02:38.621294Z 2019-04-26T07:02:38.620+0000 I CONTROL [initandlisten] shutting down with code:0 2019-04-26T07:02:38.6575529Z mongodb-replicaset exited without error 2019-04-26T07:02:38.6577176Z event: {ExitSuccess mongodb-replicaset} 2019-04-26T07:02:38.6578287Z mongodb-replicaset.Run end 2019-04-26T07:02:43.1722667Z killing processes for job "preStart" 2019-04-26T07:02:43.1724554Z preStart.kill 2019-04-26T07:02:43.1726232Z killing command 'preStart' at pid: 20 2019-04-26T07:02:43.1727283Z killing processes for job "preStop" 2019-04-26T07:02:43.1728598Z preStop.kill 2019-04-26T07:02:43.1730272Z killing command 'preStop' at pid: 21 2019-04-26T07:02:43.173183Z killing processes for job "onchange-mongodb-replicaset" 2019-04-26T07:02:43.1732949Z onchange-mongodb-replicaset.kill 2019-04-26T07:02:43.1733999Z killing command 'onchange-mongodb-replicaset' at pid: 135 2019-04-26T07:02:43.1735085Z killing processes for job "mongodb-replicaset" 2019-04-26T07:02:43.1736057Z mongodb-replicaset.kill 2019-04-26T07:02:43.1737241Z killing command 'mongodb-replicaset' at pid: 28 ```

So, according to that:

Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority

It looks like a replicaset can't function with only a single node. Haven't tested, but perhaps if that remaining node happens to be PRIMARY the replicaset will still function? Not sure.

At any rate, looks like breaking when scaling to 1 has nothing to do with the containerpilot configuration, and is just how replicasets work in mongo. Perhaps you could force an election to the instance you know will be remaining when scaling to 1, but probably not a thing that should be done.

EDIT:

If anyone's interested, here's what happens when scaling back up from 1 instance (which at this point is stuck as a "broken" SECONDARY that is unable to elect itself without a majority) back up to 3:

```console 2019-04-26T07:28:05.0317248Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:28:05.7168454Z 2019-04-26T07:28:05.716+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.7:27017 - HostUnreachable: No route to host 2019-04-26T07:28:05.716956Z 2019-04-26T07:28:05.716+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.7:27017 due to failed operation on a connection 2019-04-26T07:28:05.7170312Z 2019-04-26T07:28:05.716+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to 172.27.0.6:27017 - HostUnreachable: No route to host 2019-04-26T07:28:05.7171124Z 2019-04-26T07:28:05.717+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 172.27.0.6:27017 due to failed operation on a connection 2019-04-26T07:28:05.7173251Z 2019-04-26T07:28:05.717+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.7:27017; HostUnreachable: No route to host 2019-04-26T07:28:05.7173891Z 2019-04-26T07:28:05.717+0000 I REPL [ReplicationExecutor] Error in heartbeat request to 172.27.0.6:27017; HostUnreachable: No route to host 2019-04-26T07:28:10.7182527Z 2019-04-26T07:28:10.718+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.7:27017 2019-04-26T07:28:10.7183725Z 2019-04-26T07:28:10.718+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.6:27017 2019-04-26T07:28:10.7258611Z 2019-04-26T07:28:10.725+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 172.27.0.7:27017, took 8ms (1 connections now open to 172.27.0.7:27017) 2019-04-26T07:28:10.728623Z 2019-04-26T07:28:10.727+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 172.27.0.6:27017, took 10ms (1 connections now open to 172.27.0.6:27017) 2019-04-26T07:28:10.7318014Z 2019-04-26T07:28:10.731+0000 I NETWORK [thread1] connection accepted from 172.27.0.7:60516 #706 (1 connection now open) 2019-04-26T07:28:10.7319534Z 2019-04-26T07:28:10.731+0000 I REPL [ReplicationExecutor] Member 172.27.0.7:27017 is now in state STARTUP 2019-04-26T07:28:10.7321261Z 2019-04-26T07:28:10.731+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:33908 #707 (2 connections now open) 2019-04-26T07:28:10.7323839Z 2019-04-26T07:28:10.732+0000 I NETWORK [conn706] received client metadata from 172.27.0.7:60516 conn706: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:10.7332777Z 2019-04-26T07:28:10.732+0000 I REPL [ReplicationExecutor] Member 172.27.0.6:27017 is now in state STARTUP 2019-04-26T07:28:10.7338972Z 2019-04-26T07:28:10.733+0000 I NETWORK [conn707] received client metadata from 172.27.0.6:33908 conn707: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:10.7398169Z 2019-04-26T07:28:10.738+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:33914 #708 (3 connections now open) 2019-04-26T07:28:10.7402527Z 2019-04-26T07:28:10.740+0000 I NETWORK [thread1] connection accepted from 172.27.0.7:60526 #709 (4 connections now open) 2019-04-26T07:28:10.7415243Z 2019-04-26T07:28:10.741+0000 I - [conn708] end connection 172.27.0.6:33914 (4 connections now open) 2019-04-26T07:28:10.7423998Z 2019-04-26T07:28:10.742+0000 I - [conn709] end connection 172.27.0.7:60526 (3 connections now open) 2019-04-26T07:28:11.1050183Z 2019-04-26T07:28:11.104+0000 I REPL [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms 2019-04-26T07:28:11.1051203Z 2019-04-26T07:28:11.104+0000 I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected. current term: 1 2019-04-26T07:28:11.1068559Z 2019-04-26T07:28:11.106+0000 I REPL [ReplicationExecutor] VoteRequester(term 1 dry run) received a yes vote from 172.27.0.7:27017; response message: { term: 1, voteGranted: true, reason: "", ok: 1.0 } 2019-04-26T07:28:11.106969Z 2019-04-26T07:28:11.106+0000 I REPL [ReplicationExecutor] dry election run succeeded, running for election in term 2 2019-04-26T07:28:11.1424772Z 2019-04-26T07:28:11.142+0000 I REPL [ReplicationExecutor] VoteRequester(term 2) received a yes vote from 172.27.0.7:27017; response message: { term: 2, voteGranted: true, reason: "", ok: 1.0 } 2019-04-26T07:28:11.142588Z 2019-04-26T07:28:11.142+0000 I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 2 2019-04-26T07:28:11.1426625Z 2019-04-26T07:28:11.142+0000 I REPL [ReplicationExecutor] transition to PRIMARY 2019-04-26T07:28:11.1428852Z 2019-04-26T07:28:11.142+0000 I REPL [ReplicationExecutor] Entering primary catch-up mode. 2019-04-26T07:28:11.1449015Z 2019-04-26T07:28:11.144+0000 I REPL [ReplicationExecutor] Member 172.27.0.6:27017 is now in state STARTUP2 2019-04-26T07:28:11.1452359Z 2019-04-26T07:28:11.144+0000 I REPL [ReplicationExecutor] Member 172.27.0.7:27017 is now in state STARTUP2 2019-04-26T07:28:11.1453983Z 2019-04-26T07:28:11.144+0000 I REPL [ReplicationExecutor] Caught up to the latest optime known via heartbeats after becoming primary. 2019-04-26T07:28:11.1457319Z 2019-04-26T07:28:11.144+0000 I REPL [ReplicationExecutor] Exited primary catch-up mode. 2019-04-26T07:28:12.9675612Z 2019-04-26T07:28:12.967+0000 I REPL [rsSync] transition to primary complete; database writes are now permitted 2019-04-26T07:28:14.0507945Z check.mongodb-replicaset.Run start 2019-04-26T07:28:14.3380119Z 2019-04-26 07:28:14,337 DEBUG manage.py [health] health 2019-04-26T07:28:14.3411948Z 2019-04-26T07:28:14.340+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:54664 #710 (3 connections now open) 2019-04-26T07:28:14.3419671Z 2019-04-26T07:28:14.341+0000 I NETWORK [conn710] received client metadata from 172.27.0.5:54664 conn710: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:28:14.3455418Z 2019-04-26T07:28:14.345+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:54666 #711 (4 connections now open) 2019-04-26T07:28:14.3471163Z 2019-04-26T07:28:14.346+0000 I NETWORK [conn711] received client metadata from 172.27.0.5:54666 conn711: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:28:14.3513854Z 2019-04-26 07:28:14,351 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:28:14.3572595Z 2019-04-26 07:28:14,356 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:28:14.3600636Z 2019-04-26T07:28:14.359+0000 I - [conn711] end connection 172.27.0.5:54666 (4 connections now open) 2019-04-26T07:28:14.3605371Z 2019-04-26T07:28:14.360+0000 I - [conn710] end connection 172.27.0.5:54664 (3 connections now open) 2019-04-26T07:28:14.3615864Z 2019-04-26 07:28:14,360 DEBUG manage.py [health] health: True 2019-04-26T07:28:14.9091247Z check.mongodb-replicaset exited without error 2019-04-26T07:28:14.9092383Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:28:14.9093172Z check.mongodb-replicaset.Run end 2019-04-26T07:28:14.909444Z check.mongodb-replicaset.term 2019-04-26T07:28:14.9098099Z terminating command 'check.mongodb-replicaset' at pid: 3268 2019-04-26T07:28:14.9105898Z event: {StatusHealthy mongodb-replicaset} 2019-04-26T07:28:15.9731211Z 2019-04-26T07:28:15.972+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:33938 #712 (3 connections now open) 2019-04-26T07:28:15.9754008Z 2019-04-26T07:28:15.975+0000 I NETWORK [conn712] received client metadata from 172.27.0.6:33938 conn712: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:15.9764406Z 2019-04-26T07:28:15.976+0000 I NETWORK [thread1] connection accepted from 172.27.0.7:60550 #713 (4 connections now open) 2019-04-26T07:28:15.9777594Z 2019-04-26T07:28:15.977+0000 I NETWORK [conn713] received client metadata from 172.27.0.7:60550 conn713: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:15.9882926Z 2019-04-26T07:28:15.987+0000 I NETWORK [thread1] connection accepted from 172.27.0.7:60552 #714 (5 connections now open) 2019-04-26T07:28:15.9903572Z 2019-04-26T07:28:15.989+0000 I NETWORK [conn714] received client metadata from 172.27.0.7:60552 conn714: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:16.0015179Z 2019-04-26T07:28:16.001+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:33944 #715 (6 connections now open) 2019-04-26T07:28:16.0020181Z 2019-04-26T07:28:16.001+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:33946 #716 (7 connections now open) 2019-04-26T07:28:16.0046294Z 2019-04-26T07:28:16.004+0000 I NETWORK [conn716] received client metadata from 172.27.0.6:33946 conn716: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:16.0057452Z 2019-04-26T07:28:16.004+0000 I NETWORK [thread1] connection accepted from 172.27.0.7:60558 #717 (8 connections now open) 2019-04-26T07:28:16.0076067Z 2019-04-26T07:28:16.007+0000 I NETWORK [conn717] received client metadata from 172.27.0.7:60558 conn717: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:16.0155033Z 2019-04-26T07:28:16.013+0000 I NETWORK [conn715] received client metadata from 172.27.0.6:33944 conn715: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:17.1510646Z 2019-04-26T07:28:17.150+0000 I REPL [ReplicationExecutor] Member 172.27.0.6:27017 is now in state SECONDARY 2019-04-26T07:28:17.1512531Z 2019-04-26T07:28:17.151+0000 I REPL [ReplicationExecutor] Member 172.27.0.7:27017 is now in state SECONDARY 2019-04-26T07:28:17.1533118Z 2019-04-26T07:28:17.152+0000 I - [conn713] end connection 172.27.0.7:60550 (8 connections now open) 2019-04-26T07:28:17.1540079Z 2019-04-26T07:28:17.153+0000 I - [conn712] end connection 172.27.0.6:33938 (7 connections now open) 2019-04-26T07:28:18.0703947Z event: {StatusChanged watch.mongodb-replicaset} 2019-04-26T07:28:18.0705102Z event: {StatusHealthy watch.mongodb-replicaset} 2019-04-26T07:28:18.0706151Z onchange-mongodb-replicaset.Run start 2019-04-26T07:28:18.43455Z 2019-04-26 07:28:18,434 DEBUG manage.py [on_change] on_change 2019-04-26T07:28:18.4383145Z 2019-04-26T07:28:18.438+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:54706 #718 (7 connections now open) 2019-04-26T07:28:18.4396918Z 2019-04-26T07:28:18.439+0000 I NETWORK [conn718] received client metadata from 172.27.0.5:54706 conn718: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:28:18.443566Z 2019-04-26T07:28:18.442+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:54708 #719 (8 connections now open) 2019-04-26T07:28:18.4457132Z 2019-04-26T07:28:18.445+0000 I NETWORK [conn719] received client metadata from 172.27.0.5:54708 conn719: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:28:18.4501628Z 2019-04-26 07:28:18,449 DEBUG manage.py [on_change] mongo_update_replset_config 2019-04-26T07:28:18.4543635Z 2019-04-26 07:28:18,454 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:28:18.4639606Z 2019-04-26 07:28:18,463 DEBUG urllib3.connectionpool http://consul:8500 "GET /v1/health/service/mongodb-replicaset HTTP/1.1" 200 None 2019-04-26T07:28:18.465984Z 2019-04-26 07:28:18,465 DEBUG manage.py [on_change] mongo_update_replset_config: None 2019-04-26T07:28:18.466482Z 2019-04-26T07:28:18.466+0000 I - [conn718] end connection 172.27.0.5:54706 (8 connections now open) 2019-04-26T07:28:18.4679276Z 2019-04-26T07:28:18.467+0000 I - [conn719] end connection 172.27.0.5:54708 (7 connections now open) 2019-04-26T07:28:18.4692255Z 2019-04-26 07:28:18,466 DEBUG manage.py [on_change] on_change: None 2019-04-26T07:28:18.4702204Z 2019-04-26 07:28:18,466 INFO manage.py Function failed on_change 2019-04-26T07:28:19.0249161Z onchange-mongodb-replicaset exited with error: exit status 1 2019-04-26T07:28:19.0250061Z event: {ExitFailed onchange-mongodb-replicaset} 2019-04-26T07:28:19.0251115Z event: {Error onchange-mongodb-replicaset: exit status 1} 2019-04-26T07:28:19.0252205Z onchange-mongodb-replicaset.Run end 2019-04-26T07:28:19.0253205Z onchange-mongodb-replicaset.term 2019-04-26T07:28:19.0253904Z terminating command 'onchange-mongodb-replicaset' at pid: 3283 2019-04-26T07:28:24.0157241Z check.mongodb-replicaset.Run start 2019-04-26T07:28:24.1532583Z 2019-04-26T07:28:24.152+0000 I NETWORK [thread1] connection accepted from 127.0.0.1:43958 #720 (7 connections now open) 2019-04-26T07:28:24.1558177Z 2019-04-26T07:28:24.155+0000 I NETWORK [conn720] received client metadata from 127.0.0.1:43958 conn720: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:24.3451246Z 2019-04-26 07:28:24,344 DEBUG manage.py [health] health 2019-04-26T07:28:24.3478558Z 2019-04-26T07:28:24.347+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:54726 #721 (8 connections now open) 2019-04-26T07:28:24.3488478Z 2019-04-26T07:28:24.348+0000 I NETWORK [conn721] received client metadata from 172.27.0.5:54726 conn721: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:28:24.3513792Z 2019-04-26T07:28:24.351+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:54728 #722 (9 connections now open) 2019-04-26T07:28:24.3518468Z 2019-04-26T07:28:24.351+0000 I NETWORK [conn722] received client metadata from 172.27.0.5:54728 conn722: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "Ubuntu 16.04 xenial", architecture: "x86_64", version: "4.9.125-linuxkit" }, platform: "CPython 2.7.12.final.0" } 2019-04-26T07:28:24.3554837Z 2019-04-26 07:28:24,355 DEBUG urllib3.connectionpool Starting new HTTP connection (1): consul:8500 2019-04-26T07:28:24.3599596Z 2019-04-26 07:28:24,359 DEBUG urllib3.connectionpool http://consul:8500 "PUT /v1/session/renew/55b1fb07-7005-60ea-148c-76fe2fefa46b HTTP/1.1" 200 215 2019-04-26T07:28:24.3619971Z 2019-04-26T07:28:24.361+0000 I - [conn721] end connection 172.27.0.5:54726 (9 connections now open) 2019-04-26T07:28:24.3625382Z 2019-04-26T07:28:24.361+0000 I - [conn722] end connection 172.27.0.5:54728 (9 connections now open) 2019-04-26T07:28:24.3627914Z 2019-04-26 07:28:24,361 DEBUG manage.py [health] health: True 2019-04-26T07:28:24.9208953Z check.mongodb-replicaset exited without error 2019-04-26T07:28:24.9212096Z event: {ExitSuccess check.mongodb-replicaset} 2019-04-26T07:28:24.9213234Z check.mongodb-replicaset.Run end 2019-04-26T07:28:24.9215448Z check.mongodb-replicaset.term 2019-04-26T07:28:24.9216855Z terminating command 'check.mongodb-replicaset' at pid: 3301 2019-04-26T07:28:24.9218794Z event: {StatusHealthy mongodb-replicaset} ```

Essentially what happened is the dangling SECONDARY got two yes votes from the other instances once they came back online and since there hadn't been a PRIMARY seen in the past 10000ms, that dangling SECONDARY got promoted to PRIMARY and the two revived instances became SECONDARY.

That means instance number 1 is now PRIMARY where instance number 2 had previously been. rs.status() reports the newly elected PRIMARY and two new SECONDARY instances as expected:

```console mongodb-replicaset:PRIMARY> rs.status() { "set" : "mongodb-replicaset", "date" : ISODate("2019-04-26T08:19:22.440Z"), "myState" : 1, "term" : NumberLong(2), "syncingTo" : "", "syncSourceHost" : "", "syncSourceId" : -1, "heartbeatIntervalMillis" : NumberLong(2000), "optimes" : { "lastCommittedOpTime" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) }, "appliedOpTime" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) }, "durableOpTime" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) } }, "members" : [ { "_id" : 1, "name" : "172.27.0.6:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 3071, "optime" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) }, "optimeDurable" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) }, "optimeDate" : ISODate("2019-04-26T08:19:13Z"), "optimeDurableDate" : ISODate("2019-04-26T08:19:13Z"), "lastHeartbeat" : ISODate("2019-04-26T08:19:22.080Z"), "lastHeartbeatRecv" : ISODate("2019-04-26T08:19:21.879Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "", "syncingTo" : "172.27.0.5:27017", "syncSourceHost" : "172.27.0.5:27017", "syncSourceId" : 3, "infoMessage" : "", "configVersion" : 3 }, { "_id" : 2, "name" : "172.27.0.7:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 3071, "optime" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) }, "optimeDurable" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) }, "optimeDate" : ISODate("2019-04-26T08:19:13Z"), "optimeDurableDate" : ISODate("2019-04-26T08:19:13Z"), "lastHeartbeat" : ISODate("2019-04-26T08:19:22.080Z"), "lastHeartbeatRecv" : ISODate("2019-04-26T08:19:21.625Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "", "syncingTo" : "172.27.0.5:27017", "syncSourceHost" : "172.27.0.5:27017", "syncSourceId" : 3, "infoMessage" : "", "configVersion" : 3 }, { "_id" : 3, "name" : "172.27.0.5:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 6494, "optime" : { "ts" : Timestamp(1556266753, 1), "t" : NumberLong(2) }, "optimeDate" : ISODate("2019-04-26T08:19:13Z"), "syncingTo" : "", "syncSourceHost" : "", "syncSourceId" : -1, "infoMessage" : "", "electionTime" : Timestamp(1556263691, 1), "electionDate" : ISODate("2019-04-26T07:28:11Z"), "configVersion" : 3, "self" : true, "lastHeartbeatMessage" : "" } ], "ok" : 1 } ```

Here is the relevant chunk of log output from instance number 2 upon rejoining as a SECONDARY:

```console 2019-04-26T07:28:10.7203598Z 2019-04-26T07:28:10.720+0000 I NETWORK [thread1] connection accepted from 172.27.0.5:58868 #5 (1 connection now open) 2019-04-26T07:28:10.7237062Z 2019-04-26T07:28:10.723+0000 I NETWORK [conn5] received client metadata from 172.27.0.5:58868 conn5: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:10.7302212Z 2019-04-26T07:28:10.729+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.5:27017 2019-04-26T07:28:10.7334901Z 2019-04-26T07:28:10.733+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 172.27.0.5:27017, took 4ms (1 connections now open to 172.27.0.5:27017) 2019-04-26T07:28:10.7378444Z 2019-04-26T07:28:10.737+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:59864 #6 (2 connections now open) 2019-04-26T07:28:10.739074Z 2019-04-26T07:28:10.738+0000 I - [conn6] end connection 172.27.0.6:59864 (2 connections now open) 2019-04-26T07:28:10.7914393Z 2019-04-26T07:28:10.785+0000 I REPL [replExecDBWorker-0] Starting replication storage threads 2019-04-26T07:28:10.8377276Z 2019-04-26T07:28:10.837+0000 I REPL [replication-0] Starting initial sync (attempt 1 of 10) 2019-04-26T07:28:10.8386104Z 2019-04-26T07:28:10.838+0000 I NETWORK [thread1] connection accepted from 172.27.0.6:59870 #7 (2 connections now open) 2019-04-26T07:28:10.8387862Z 2019-04-26T07:28:10.838+0000 I REPL [ReplicationExecutor] New replica set config in use: { _id: "mongodb-replicaset", version: 3, protocolVersion: 1, members: [ { _id: 1, host: "172.27.0.6:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "172.27.0.7:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "172.27.0.5:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5cc2a5b649b55a5aa63d5171') } } 2019-04-26T07:28:10.8392187Z 2019-04-26T07:28:10.838+0000 I REPL [ReplicationExecutor] This node is 172.27.0.7:27017 in the config 2019-04-26T07:28:10.8405364Z 2019-04-26T07:28:10.838+0000 I REPL [ReplicationExecutor] transition to STARTUP2 2019-04-26T07:28:10.8406366Z 2019-04-26T07:28:10.838+0000 I NETWORK [conn7] received client metadata from 172.27.0.6:59870 conn7: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.20" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2019-04-26T07:28:10.8407293Z 2019-04-26T07:28:10.839+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 172.27.0.6:27017 2019-04-26T07:28:10.8417685Z 2019-04-26T07:28:10.840+0000 I REPL [ReplicationExecutor] Member 172.27.0.5:27017 is now in state SECONDARY 2019-04-26T07:28:10.8494541Z 2019-04-26T07:28:10.848+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 172.27.0.6:27017, took 9ms (1 connections now open to 172.27.0.6:27017) 2019-04-26T07:28:10.8551589Z 2019-04-26T07:28:10.854+0000 I REPL [ReplicationExecutor] Member 172.27.0.6:27017 is now in state STARTUP2 2019-04-26T07:28:15.8111923Z check.mongodb-replicaset.Run start 2019-04-26T07:28:15.8426487Z 2019-04-26T07:28:15.841+0000 I REPL [ReplicationExecutor] Member 172.27.0.5:27017 is now in state PRIMARY 2019-04-26T07:28:15.8841513Z 2019-04-26T07:28:15.883+0000 I REPL [replication-1] sync source candidate: 172.27.0.5:27017 2019-04-26T07:28:15.8846998Z 2019-04-26T07:28:15.884+0000 I STORAGE [replication-1] dropAllDatabasesExceptLocal 1 2019-04-26T07:28:15.8849204Z 2019-04-26T07:28:15.884+0000 I REPL [replication-1] ****** 2019-04-26T07:28:15.8850824Z 2019-04-26T07:28:15.884+0000 I REPL [replication-1] creating replication oplog of size: 2618MB... 2019-04-26T07:28:15.9043083Z 2019-04-26T07:28:15.904+0000 I STORAGE [replication-1] Starting WiredTigerRecordStoreThread local.oplog.rs 2019-04-26T07:28:15.9044429Z 2019-04-26T07:28:15.904+0000 I STORAGE [replication-1] The size storer reports that the oplog contains 0 records totaling to 0 bytes 2019-04-26T07:28:15.904865Z 2019-04-26T07:28:15.904+0000 I STORAGE [replication-1] Scanning the oplog to determine where to place markers for truncation 2019-04-26T07:28:15.971821Z 2019-04-26T07:28:15.971+0000 I REPL [replication-1] ****** 2019-04-26T07:28:15.9724694Z 2019-04-26T07:28:15.972+0000 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to 172.27.0.5:27017 2019-04-26T07:28:15.9815458Z 2019-04-26T07:28:15.979+0000 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to 172.27.0.5:27017, took 7ms (1 connections now open to 172.27.0.5:27017) 2019-04-26T07:28:15.9844314Z 2019-04-26T07:28:15.984+0000 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to 172.27.0.5:27017 2019-04-26T07:28:15.9968485Z 2019-04-26T07:28:15.996+0000 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to 172.27.0.5:27017, took 12ms (2 connections now open to 172.27.0.5:27017) 2019-04-26T07:28:16.0006486Z 2019-04-26T07:28:16.000+0000 I REPL [replication-0] CollectionCloner::start called, on ns:admin.system.version 2019-04-26T07:28:16.0032336Z 2019-04-26T07:28:16.002+0000 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to 172.27.0.5:27017 2019-04-26T07:28:16.0085263Z 2019-04-26T07:28:16.008+0000 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to 172.27.0.5:27017, took 8ms (3 connections now open to 172.27.0.5:27017) 2019-04-26T07:28:16.0438857Z 2019-04-26T07:28:16.043+0000 I INDEX [InitialSyncInserters-admin.system.version0] build index on: admin.system.version properties: { v: 2, key: { version: 1 }, name: "incompatible_with_version_32", ns: "admin.system.version" } 2019-04-26T07:28:16.0440413Z 2019-04-26T07:28:16.043+0000 I INDEX [InitialSyncInserters-admin.system.version0] building index using bulk method; build may temporarily use up to 500 megabytes of RAM 2019-04-26T07:28:16.0534065Z 2019-04-26T07:28:16.053+0000 I INDEX [InitialSyncInserters-admin.system.version0] build index on: admin.system.version properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "admin.system.version" } 2019-04-26T07:28:16.0535349Z 2019-04-26T07:28:16.053+0000 I INDEX [InitialSyncInserters-admin.system.version0] building index using bulk method; build may temporarily use up to 500 megabytes of RAM 2019-04-26T07:28:16.0558699Z 2019-04-26T07:28:16.055+0000 I COMMAND [InitialSyncInserters-admin.system.version0] setting featureCompatibilityVersion to 3.4 2019-04-26T07:28:16.0676187Z 2019-04-26T07:28:16.067+0000 I REPL [replication-0] No need to apply operations. (currently at { : Timestamp 1556263692000|1 }) 2019-04-26T07:28:16.0759263Z 2019-04-26T07:28:16.075+0000 I ASIO [NetworkInterfaceASIO-RS-0] Ending connection to host 172.27.0.5:27017 due to bad connection status; 2 connections to that host remain open 2019-04-26T07:28:16.0774193Z 2019-04-26T07:28:16.075+0000 I REPL [replication-0] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp 1556263692000|1, t: 2 }[5058130059794077219] 2019-04-26T07:28:16.0775344Z 2019-04-26T07:28:16.076+0000 I REPL [replication-0] Initial sync attempt finishing up. 2019-04-26T07:28:16.0781039Z 2019-04-26T07:28:16.076+0000 I REPL [replication-0] Initial Sync Attempt Statistics: { failedInitialSyncAttempts: 0, maxFailedInitialSyncAttempts: 10, initialSyncStart: new Date(1556263690837), initialSyncAttempts: [], fetchedMissingDocs: 0, appliedOps: 0, initialSyncOplogStart: Timestamp 1556263692000|1, initialSyncOplogEnd: Timestamp 1556263692000|1, databases: { databasesCloned: 1, admin: { collections: 1, clonedCollections: 1, start: new Date(1556263695994), end: new Date(1556263696066), elapsedMillis: 72, admin.system.version: { documentsToCopy: 1, documentsCopied: 1, indexes: 2, fetchedBatches: 1, start: new Date(1556263696000), end: new Date(1556263696066), elapsedMillis: 66 } } } } 2019-04-26T07:28:16.0854234Z 2019-04-26T07:28:16.084+0000 I REPL [replication-0] initial sync done; took 5s. 2019-04-26T07:28:16.0865799Z 2019-04-26T07:28:16.086+0000 I REPL [replication-0] Starting replication fetcher thread 2019-04-26T07:28:16.0873393Z 2019-04-26T07:28:16.086+0000 I REPL [replication-0] Starting replication applier thread 2019-04-26T07:28:16.0880385Z 2019-04-26T07:28:16.087+0000 I REPL [replication-0] Starting replication reporter thread 2019-04-26T07:28:16.0881585Z 2019-04-26T07:28:16.087+0000 I REPL [rsSync] transition to RECOVERING 2019-04-26T07:28:16.0917979Z 2019-04-26T07:28:16.091+0000 I REPL [rsSync] transition to SECONDARY 2019-04-26T07:28:16.0961167Z 2019-04-26T07:28:16.095+0000 I REPL [rsBackgroundSync] could not find member to sync from 2019-04-26T07:28:16.1004319Z 2019-04-26T07:28:16.099+0000 I REPL [ReplicationExecutor] Member 172.27.0.6:27017 is now in state SECONDARY ```

The STORAGE [replication-1] dropAllDatabasesExceptLocal 1 line is a bit worrisome, and might be the moment my app database and app data was dropped (all that remains is admin and local). Would have expected my app data to survive the election process, so will verify by trying steps again.

UPDATE:

Re-did the 3-to-1-to-3 steps and the app database and app data did survive the election process this time as was originally hoped for. PRIMARY was 3, which was scaled down, 1 became a dangling SECONDARY, and upon coming back 1 became PRIMARY and 3 became SECONDARY just as before but this time all data was replicated and retained.

UPDATE UPDATE:

Upon scaling mongo down to 0, then back to 3, the three mongos come up but rs.status() says:

```console > rs.slaveOk() > show dbs local 0.000GB > rs.status() { "info" : "run rs.initiate(...) if not yet done for the set", "ok" : 0, "errmsg" : "no replset config has been received", "code" : 94, "codeName" : "NotYetInitialized" } ```

Which is the original issue as explained by @tianon (sorry for the detour). So, in conclusion, we still need a way to bust the Consul cache of who the primary is supposed to be when scaling to 0.