docker-archive / classicswarm

Swarm Classic: a container clustering system. Not to be confused with Docker Swarm which is at https://github.com/docker/swarmkit
Apache License 2.0
5.75k stars 1.08k forks source link

Swarm manager getting flustered leading to random container rescheduling #2186

Closed mrapczynski closed 8 years ago

mrapczynski commented 8 years ago

Sorry, I could not think of a more technical issue title as I'm not quite sure yet what I'm dealing with.

Platform: CentOS 7, Docker 1.10.3 CS (we are now a paying customer), Swarm 1.2, Consul 0.6.4. Size of Swarm is 5 engines, 5 managers (1 on each VM), and 5 Consul instances (1 on each VM).

I'm coming into the office each morning and finding a few of the containers for which I have explicitly enabled auto rescheduling (via an environment variable) have indeed been rescheduled due to what Swarm thought was a node failure. What makes this complicated is the node actually never failed, but I suspect there is something wonky going on either (a) the API calls to Consul, or (b) Consul itself.

To compound the problem, because Swarm thinks the original container went down, Swarm schedules a 2nd unnecessary container to run. Now with two containers running and doing double duty, I'm getting errors from other services that are getting upset for being hit with too much traffic in a given span of time.

I'm starting to wonder if backing my cluster with Consul is a bad idea. When I look through the Consul logs, it seems like it is Consul with the issues and Swarm is just a victim. You could be led to believe that I have worst the platform in the world with nodes coming and going every few minutes. Feel free to speak with honesty on this issue. We are not committed to Consul if it is not reliable.

Logs from Swarm Manager

time="2016-04-28T08:07:15Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:07:15Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:07:15Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:07:15Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:07:36Z" level=info msg="Removed Engine "
time="2016-04-28T08:08:35Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:08:35Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:08:38Z" level=info msg="Registered Engine dockertest2.ad.fhda.edu at dockertest2.ad.fhda.edu:2376"
time="2016-04-28T08:09:56Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:09:56Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:09:56Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:09:56Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:15:15Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:15:15Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:15:15Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:15:15Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:18:35Z" level=error msg="Error monitoring events: EOF." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:18:35Z" level=error msg="Restart event monitoring." id="S5SR:B2RX:NQM2:CDER:XZTT:LF7V:JA3G:BFTM:LWZA:6GUM:NZOB:6BDP" name=dockertest.ad.fhda.edu
time="2016-04-28T08:20:11Z" level=info msg="Registered Engine dockertest.ad.fhda.edu at dockertest.ad.fhda.edu:2376"
time="2016-04-28T09:06:40Z" level=info msg="Removed Engine dockermgmt.ad.fhda.edu"
time="2016-04-28T09:06:40Z" level=info msg="Removed Engine dockern1.ad.fhda.edu"
time="2016-04-28T09:06:42Z" level=error msg="failed to acquire lock: Unexpected response code: 500 (rpc error: invalid session \"c682b404-fec6-7d0f-cdf1-658f630ecabe\")"
time="2016-04-28T09:06:42Z" level=info msg="New leader elected: dockertest.ad.fhda.edu:3376"
time="2016-04-28T09:06:51Z" level=info msg="Registered Engine dockern1.ad.fhda.edu at dockern1.ad.fhda.edu:2376"
time="2016-04-28T09:06:52Z" level=info msg="Leader Election: Cluster leadership lost"
time="2016-04-28T09:07:43Z" level=info msg="Registered Engine dockermgmt.ad.fhda.edu at dockermgmt.ad.fhda.edu:2376"
time="2016-04-28T09:07:57Z" level=info msg="Removed Engine dockermgmt.ad.fhda.edu"
time="2016-04-28T09:08:03Z" level=info msg="Removed Engine dockertest.ad.fhda.edu"
time="2016-04-28T09:08:03Z" level=info msg="New leader elected: dockern1.ad.fhda.edu:3376"
time="2016-04-28T09:08:28Z" level=info msg="Removed Engine dockern1.ad.fhda.edu"
time="2016-04-28T09:08:28Z" level=info msg="Leader Election: Cluster leadership acquired"
time="2016-04-28T09:08:41Z" level=info msg="Removed Engine dockertest2.ad.fhda.edu"
time="2016-04-28T09:08:43Z" level=info msg="Leader Election: Cluster leadership lost"
time="2016-04-28T09:08:43Z" level=info msg="Leader Election: Cluster leadership acquired"
time="2016-04-28T09:08:51Z" level=info msg="Registered Engine dockern1.ad.fhda.edu at dockern1.ad.fhda.edu:2376"
time="2016-04-28T09:09:10Z" level=info msg="Registered Engine dockertest2.ad.fhda.edu at dockertest2.ad.fhda.edu:2376"
time="2016-04-28T09:09:18Z" level=info msg="Registered Engine dockertest.ad.fhda.edu at dockertest.ad.fhda.edu:2376"
time="2016-04-28T09:09:42Z" level=info msg="Removed Engine dockertest.ad.fhda.edu"
time="2016-04-28T09:09:42Z" level=error msg="Failed to reschedule container b5c427d8247cae6d0c1dc1342c28ba4f0b43a29c09edc04201fdf3b66dcfc320: Unable to find a node that satisfies the following conditions \n[port 8030 (Bridge mode)]\n[container!=*shibboleth* (soft=true)]\n[environment==test]"
time="2016-04-28T09:09:42Z" level=error msg="Failed to reschedule container 46a4cdef9d92f72d4f8bd3358bfa100ad618f6acb89a283fb1ce1c381a8c0302: Conflict: The name /docker_staff-documents-portlet_test_1 is already assigned. You have to delete (or rename) that container to be able to assign /docker_staff-documents-portlet_test_1 to a container again."
time="2016-04-28T09:10:18Z" level=info msg="Registered Engine dockertest.ad.fhda.edu at dockertest.ad.fhda.edu:2376"
time="2016-04-28T09:10:44Z" level=info msg="Registered Engine dockermgmt.ad.fhda.edu at dockermgmt.ad.fhda.edu:2376"
time="2016-04-28T09:20:09Z" level=info msg="Removed Engine dockertest2.ad.fhda.edu"
time="2016-04-28T09:20:09Z" level=error msg="Failed to reschedule container 949b62f98c11492298766df0b65d1924d3bd1dd68f07d40df1d0108cfa673712: Unable to find a node that satisfies the following conditions \n[port 8030 (Bridge mode)]\n[container!=*shibboleth* (soft=true)]\n[environment==test]"
time="2016-04-28T09:20:09Z" level=error msg="Failed to reschedule container 42f7d9e8ced2af58e0ba2bd5a653cd14396b7737c8cb20353a029387089fc56f: Conflict: The name /docker_staff-documents-portlet_test_1 is already assigned. You have to delete (or rename) that container to be able to assign /docker_staff-documents-portlet_test_1 to a container again."
time="2016-04-28T09:20:25Z" level=info msg="Rescheduled container 3bee14ba2ca480246069529c1a1c66456577e3ce09e9c06f9e16481783dbf556 from dockertest2.ad.fhda.edu to dockertest.ad.fhda.edu as 80cc3ffd7eb7371ba871d9f4ad78674a2171a04b4ae76f7fdd103278ca71c940"
time="2016-04-28T09:20:25Z" level=info msg="Container 3bee14ba2ca480246069529c1a1c66456577e3ce09e9c06f9e16481783dbf556 was running, starting container 80cc3ffd7eb7371ba871d9f4ad78674a2171a04b4ae76f7fdd103278ca71c940"
time="2016-04-28T09:21:11Z" level=info msg="Registered Engine dockertest2.ad.fhda.edu at dockertest2.ad.fhda.edu:2376"

Logs from Consul

Apr 28 02:07:55 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:55 [INFO] memberlist: Suspect dockermgmt has failed, no acks received
Apr 28 02:07:56 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:56 [INFO] memberlist: Suspect dockermgmt has failed, no acks received
Apr 28 02:07:57 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:57 [INFO] serf: EventMemberFailed: dockermgmt 10.201.2.115
Apr 28 02:07:57 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:57 [INFO] consul: removing LAN server dockermgmt (Addr: 10.201.2.115:8300) (DC: dc1)
Apr 28 02:07:58 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:58 [WARN] raft: Rejecting vote request from 10.201.2.115:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:07:59 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:59 [WARN] raft: Rejecting vote request from 10.201.2.112:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:07:59 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:59 [ERR] http: Request PUT /v1/session/renew/27138fc7-264c-aa89-0065-5ec78259e607, error: rpc error: No cluster leader from=172.17.0.3:53810
Apr 28 02:07:59 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:07:59 [WARN] raft: Rejecting vote request from 10.201.2.115:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:00 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:00 [INFO] consul: New leader elected: dockertest2
Apr 28 02:08:03 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:03 [INFO] memberlist: Marking dockertest as failed, suspect timeout reached
Apr 28 02:08:03 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:03 [INFO] serf: EventMemberFailed: dockertest 10.201.2.114
Apr 28 02:08:03 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:03 [INFO] consul: removing LAN server dockertest (Addr: 10.201.2.114:8300) (DC: dc1)
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [WARN] raft: Rejecting vote request from 10.201.2.115:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/?recurse=, error: rpc error: No cluster leader from=10.201.2.117:41379
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/65a12b9e932ac0f62d5d47d6960641cd4d9c0e8504fad5274d33d89b536844a3/?consistent=, error: rpc erro
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6615221ac14a23a682bd5142d72bf15bf7029a0da8a23e1d0bc52259ad4875bf/?consistent=, error: rpc erro
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6b77490d0ae0806770f859527cb8fa458c112fdf93c34cd66eac1c89bb57c5f2/?consistent=, error: rpc erro
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc erro
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/323b31daf9150de8b1d602ef8586b
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/323b31daf9150de8b1d602ef8586b
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/e8c6e1ce92fb991648cb436b2bb5a
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/e8c6e1ce92fb991648cb436b2bb5a
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [INFO] serf: EventMemberJoin: dockermgmt 10.201.2.115
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [INFO] consul: adding LAN server dockermgmt (Addr: 10.201.2.115:8300) (DC: dc1)
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c1d24f3e0cbfe996ce0e90a4f1524
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c1d24f3e0cbfe996ce0e90a4f1524
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c28fef2741f6c71bf58a79ea39615
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c28fef2741f6c71bf58a79ea39615
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/f98428cb989082e7a7e62c5e1e945
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/f98428cb989082e7a7e62c5e1e945
Apr 28 02:08:11 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:11 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/eb03810c175e8283d8db57c7ae68bdca10d12bf5a87a535b1c2e76aec32a6444/?consistent=, error: rpc erro
Apr 28 02:08:12 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:12 [INFO] consul: New leader elected: dockertest2
Apr 28 02:08:13 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:13 [INFO] serf: EventMemberJoin: dockertest 10.201.2.114
Apr 28 02:08:13 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:13 [INFO] consul: adding LAN server dockertest (Addr: 10.201.2.114:8300) (DC: dc1)
Apr 28 02:08:14 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:14 [INFO] memberlist: Suspect dockermgmt has failed, no acks received
Apr 28 02:08:17 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:17 [INFO] memberlist: Suspect dockermgmt has failed, no acks received
Apr 28 02:08:18 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:18 [INFO] memberlist: Suspect dockermgmt has failed, no acks received
Apr 28 02:08:19 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:19 [INFO] serf: EventMemberFailed: dockermgmt 10.201.2.115
Apr 28 02:08:19 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:19 [INFO] consul: removing LAN server dockermgmt (Addr: 10.201.2.115:8300) (DC: dc1)
Apr 28 02:08:19 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:19 [WARN] raft: Rejecting vote request from 10.201.2.112:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:20 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:20 [WARN] raft: Rejecting vote request from 10.201.2.114:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/?consistent=, error: rpc error: No cluster leader from=10.201.2.117:41494
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6615221ac14a23a682bd5142d72bf15bf7029a0da8a23e1d0bc52259ad4875bf/?consistent=, error: rpc erro
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6b77490d0ae0806770f859527cb8fa458c112fdf93c34cd66eac1c89bb57c5f2/?consistent=, error: rpc erro
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc erro
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/e8c6e1ce92fb991648cb436b2bb5a
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/e8c6e1ce92fb991648cb436b2bb5a
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c1d24f3e0cbfe996ce0e90a4f1524
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c1d24f3e0cbfe996ce0e90a4f1524
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c28fef2741f6c71bf58a79ea39615
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c28fef2741f6c71bf58a79ea39615
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/f98428cb989082e7a7e62c5e1e945
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/f98428cb989082e7a7e62c5e1e945
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/323b31daf9150de8b1d602ef8586b
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/323b31daf9150de8b1d602ef8586b
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/eb03810c175e8283d8db57c7ae68bdca10d12bf5a87a535b1c2e76aec32a6444/?consistent=, error: rpc erro
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/65a12b9e932ac0f62d5d47d6960641cd4d9c0e8504fad5274d33d89b536844a3/?consistent=, error: rpc erro
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [WARN] raft: Heartbeat timeout reached, starting election
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [INFO] raft: Node at 10.201.2.117:8300 [Candidate] entering Candidate state
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [INFO] raft: Node at 10.201.2.117:8300 [Follower] entering Follower state
Apr 28 02:08:21 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:21 [INFO] consul: New leader elected: dockertest2
Apr 28 02:08:23 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:23 [INFO] memberlist: Suspect dockern1 has failed, no acks received
Apr 28 02:08:24 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:24 [WARN] raft: Rejecting vote request from 10.201.2.115:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:25 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:25 [WARN] raft: Rejecting vote request from 10.201.2.112:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/?consistent=, error: rpc error: node is not the leader from=10.201.2.117:41524
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc erro
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/e8c6e1ce92fb991648cb436b2bb5a
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/e8c6e1ce92fb991648cb436b2bb5a
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c1d24f3e0cbfe996ce0e90a4f1524
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c1d24f3e0cbfe996ce0e90a4f1524
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c28fef2741f6c71bf58a79ea39615
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/c28fef2741f6c71bf58a79ea39615
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/f98428cb989082e7a7e62c5e1e945
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/f98428cb989082e7a7e62c5e1e945
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/323b31daf9150de8b1d602ef8586b
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/?consistent=, error: rpc error
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/bef11c901f6cb37b884d6dc5564eab38d5a1b3cba6ec23b7b205b93977dc640e/323b31daf9150de8b1d602ef8586b
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/eb03810c175e8283d8db57c7ae68bdca10d12bf5a87a535b1c2e76aec32a6444/?consistent=, error: rpc erro
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [WARN] raft: Rejecting vote request from 10.201.2.115:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/65a12b9e932ac0f62d5d47d6960641cd4d9c0e8504fad5274d33d89b536844a3/?consistent=, error: rpc erro
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6615221ac14a23a682bd5142d72bf15bf7029a0da8a23e1d0bc52259ad4875bf/?consistent=, error: rpc erro
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6b77490d0ae0806770f859527cb8fa458c112fdf93c34cd66eac1c89bb57c5f2/?consistent=, error: rpc erro
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6615221ac14a23a682bd5142d72bf15bf7029a0da8a23e1d0bc52259ad4875bf/?consistent=, error: rpc erro
Apr 28 02:08:26 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:26 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/endpoint/6b77490d0ae0806770f859527cb8fa458c112fdf93c34cd66eac1c89bb57c5f2/?consistent=, error: rpc erro
Apr 28 02:08:27 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:27 [WARN] raft: Heartbeat timeout reached, starting election
Apr 28 02:08:27 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:27 [INFO] raft: Node at 10.201.2.117:8300 [Candidate] entering Candidate state
Apr 28 02:08:27 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:27 [INFO] raft: Node at 10.201.2.117:8300 [Follower] entering Follower state
Apr 28 02:08:27 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:27 [INFO] consul: New leader elected: dockertest2
Apr 28 02:08:28 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:28 [INFO] memberlist: Marking dockern1 as failed, suspect timeout reached
Apr 28 02:08:28 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:28 [INFO] serf: EventMemberFailed: dockern1 10.201.2.112
Apr 28 02:08:28 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:28 [INFO] consul: removing LAN server dockern1 (Addr: 10.201.2.112:8300) (DC: dc1)
Apr 28 02:08:31 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:31 [ERR] raft: Failed to make RequestVote RPC to 10.201.2.112:8300: dial tcp 10.201.2.112:8300: i/o timeout
Apr 28 02:08:31 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:31 [WARN] raft: Rejecting vote request from 10.201.2.112:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:32 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:32 [WARN] raft: Rejecting vote request from 10.201.2.115:8300 since we have a leader: 10.201.2.116:8300
Apr 28 02:08:33 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:33 [WARN] raft: Heartbeat timeout reached, starting election
Apr 28 02:08:33 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:33 [INFO] raft: Node at 10.201.2.117:8300 [Candidate] entering Candidate state
Apr 28 02:08:33 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:33 [ERR] http: Request PUT /v1/session/renew/27138fc7-264c-aa89-0065-5ec78259e607, error: No cluster leader from=172.17.0.3:53913
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] raft: Duplicate RequestVote for same term: 30390
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [WARN] raft: Election timeout reached, restarting election
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] raft: Node at 10.201.2.117:8300 [Candidate] entering Candidate state
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] raft: Election won. Tally: 3
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] raft: Node at 10.201.2.117:8300 [Leader] entering Leader state
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] consul: cluster leadership acquired
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] consul: New leader elected: dockern2
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] raft: pipelining replication to peer 10.201.2.115:8300
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [INFO] raft: pipelining replication to peer 10.201.2.114:8300
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [WARN] raft: Failed to contact 10.201.2.112:8300 in 502.832368ms
Apr 28 02:08:34 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:34 [WARN] raft: Failed to contact 10.201.2.116:8300 in 502.842781ms
Apr 28 02:08:35 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:35 [WARN] raft: Failed to contact 10.201.2.116:8300 in 972.164498ms
Apr 28 02:08:35 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:35 [WARN] raft: Failed to contact 10.201.2.112:8300 in 972.154085ms
Apr 28 02:08:35 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:35 [INFO] memberlist: Suspect dockertest2 has failed, no acks received
Apr 28 02:08:35 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:35 [WARN] raft: Failed to contact 10.201.2.116:8300 in 1.440113374s
Apr 28 02:08:35 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:35 [WARN] raft: Failed to contact 10.201.2.112:8300 in 1.440102961s
Apr 28 02:08:36 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:36 [INFO] memberlist: Suspect dockertest2 has failed, no acks received
Apr 28 02:08:36 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:36 [WARN] raft: Rejecting vote request from 10.201.2.112:8300 since we have a leader: 10.201.2.117:8300
Apr 28 02:08:37 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:37 [ERR] raft: Failed to make RequestVote RPC to 10.201.2.112:8300: dial tcp 10.201.2.112:8300: i/o timeout
Apr 28 02:08:37 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:37 [WARN] raft: AppendEntries to 10.201.2.112:8300 rejected, sending older logs (next: 3194634)
Apr 28 02:08:37 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:37 [INFO] raft: pipelining replication to peer 10.201.2.112:8300
Apr 28 02:08:37 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:37 [INFO] serf: EventMemberJoin: dockern1 10.201.2.112
Apr 28 02:08:37 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:37 [INFO] consul: adding LAN server dockern1 (Addr: 10.201.2.112:8300) (DC: dc1)
Apr 28 02:08:37 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:37 [INFO] consul: member 'dockern1' joined, marking health alive
Apr 28 02:08:38 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:38 [INFO] memberlist: Marking dockertest2 as failed, suspect timeout reached
Apr 28 02:08:38 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:38 [INFO] serf: EventMemberFailed: dockertest2 10.201.2.116
Apr 28 02:08:38 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:38 [INFO] consul: removing LAN server dockertest2 (Addr: 10.201.2.116:8300) (DC: dc1)
Apr 28 02:08:38 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:38 [INFO] consul: member 'dockertest2' failed, marking health critical
Apr 28 02:08:40 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:40 [ERR] http: Request GET /v1/kv/swarm/docker/network/v1.0/network/?consistent=, error: rpc error: leadership lost while committing log from=10.201.2.117:41553
Apr 28 02:08:41 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:41 [INFO] serf: EventMemberJoin: dockertest2 10.201.2.116
Apr 28 02:08:41 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:41 [INFO] consul: adding LAN server dockertest2 (Addr: 10.201.2.116:8300) (DC: dc1)
Apr 28 02:08:41 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:41 [INFO] consul: member 'dockertest2' joined, marking health alive
Apr 28 02:08:41 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:41 [INFO] raft: pipelining replication to peer 10.201.2.116:8300
Apr 28 02:08:43 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:43 [ERR] raft: Failed to make RequestVote RPC to 10.201.2.116:8300: read tcp 10.201.2.117:51403->10.201.2.116:8300: i/o timeout
Apr 28 02:08:43 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:43 [ERR] yamux: keepalive failed: i/o deadline reached
Apr 28 02:08:43 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:43 [ERR] http: Request GET /v1/kv/swarm/docker/nodes?index=3194654&recurse=&wait=15000ms, error: rpc error: EOF from=10.201.2.117:41341
Apr 28 02:08:43 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:43 [ERR] http: Request GET /v1/kv/swarm/docker/swarm/leader?consistent=&index=3194643, error: rpc error: EOF from=172.17.0.3:53824
Apr 28 02:08:43 dockern2.ad.fhda.edu consul[2266]: 2016/04/28 02:08:43 [ERR] agent: coordinate update error: rpc error: EOF
mrapczynski commented 8 years ago

Closing this issue. After a lot of investigation and testing, we are victims of our own environment. Our VMware platform, though large with a lot of available computing power, is experiencing unusual disk latency problems at peak load (primarily during hot backups in the early mornings). These latency problems manifest themselves as very quick bursts, sometimes < 1 sec and other times for several seconds where a VM, and subsequently the K/V store does not respond to heartbeats.

We initially starting using Consul for the K/V store, but from reading the configuration guide, there are no controls for heartbeat frequency or leader election behavior. Thus when a VM is slowing down, Consul could mislead to believe a node has failed, and this sets off the Swarm manager to begin rescheduling when in reality it should not.

Since discovering this, our K/V backend has been switched etcd with noticeably better results. We have the heartbeat interval manually set to 3000ms, and the leader election timeout set to 30000ms as required, and now our cluster can sustain the latency issues without causing all sorts of added problems by unnecessarily rescheduling containers that did not actually fail.

dongluochen commented 8 years ago

@mrapczynski Thanks for providing the root cause. We may need to keep an eye on Consul. cc @abronan.