etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.6k stars 9.74k forks source link

Issue: one node reports etcdserver: publish error: etcdserver: request timed out and others failed to reach it in a three-node etcd cluster #8184

Closed DJAKN closed 7 years ago

DJAKN commented 7 years ago

One node reports etcdserver: publish error: etcdserver: request timed out and others failed to reach it in a three-node etcd cluster

I start a three-node etcd cluster (node0, node1, node2) on three servers and node2 continuously reports time out, while node0 reports some other problems about node2.

Here is the node create code:

etcd --name node0 --initial-advertise-peer-urls http://172.16.6.44:2380 \
--listen-peer-urls http://172.16.6.44:2380 \
--listen-client-urls http://172.16.6.44:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://172.16.6.44:2379 \
--initial-cluster-token etcd-cluster-hw5 \
--initial-cluster node0=http://172.16.6.44:2380,node1=http://172.16.6.218:2380,node2=http://172.16.6.231:2380 \
--initial-cluster-state new

etcd --name node1 --initial-advertise-peer-urls http://172.16.6.218:2380 \
--listen-peer-urls http://172.16.6.218:2380 \
--listen-client-urls http://172.16.6.218:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://172.16.6.218:2379 \
--initial-cluster-token etcd-cluster-hw5 \
--initial-cluster node0=http://172.16.6.44:2380,node1=http://172.16.6.218:2380,node2=http://172.16.6.231:2380 \
--initial-cluster-state new

etcd --name node2 --initial-advertise-peer-urls http://172.16.6.231:2380 \
--listen-peer-urls http://172.16.6.231:2380 \
--listen-client-urls http://172.16.6.231:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://172.16.6.231:2379 \
--initial-cluster-token etcd-cluster-hw5 \
--initial-cluster node0=http://172.16.6.44:2380,node1=http://172.16.6.218:2380,node2=http://172.16.6.231:2380 \
--initial-cluster-state new

and here are the logs of three nodes:

2017-06-27 23:11:48.586539 I | etcdmain: etcd Version: 3.1.7
2017-06-27 23:11:48.586923 I | etcdmain: Git SHA: 43b7507
2017-06-27 23:11:48.587324 I | etcdmain: Go Version: go1.7.5
2017-06-27 23:11:48.587587 I | etcdmain: Go OS/Arch: linux/amd64
2017-06-27 23:11:48.587824 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
2017-06-27 23:11:48.587968 W | etcdmain: no data-dir provided, using default data-dir ./node0.etcd
2017-06-27 23:11:48.588148 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-06-27 23:11:48.588299 I | embed: listening for peers on http://172.16.6.44:2380
2017-06-27 23:11:48.588449 I | embed: listening for client requests on 127.0.0.1:2379
2017-06-27 23:11:48.588541 I | embed: listening for client requests on 172.16.6.44:2379
2017-06-27 23:11:48.590569 I | etcdserver: recovered store from snapshot at index 27901
2017-06-27 23:11:48.590655 I | etcdserver: name = node0
2017-06-27 23:11:48.590728 I | etcdserver: data dir = node0.etcd
2017-06-27 23:11:48.590760 I | etcdserver: member dir = node0.etcd/member
2017-06-27 23:11:48.590830 I | etcdserver: heartbeat = 100ms
2017-06-27 23:11:48.590870 I | etcdserver: election = 1000ms
2017-06-27 23:11:48.590937 I | etcdserver: snapshot count = 10000
2017-06-27 23:11:48.590973 I | etcdserver: advertise client URLs = http://172.16.6.44:2379
2017-06-27 23:11:48.601223 I | etcdserver: restarting member 7197a8a131176261 in cluster 199f79acd686ab9e at commit index 28073
2017-06-27 23:11:48.601496 I | raft: 7197a8a131176261 became follower at term 8268
2017-06-27 23:11:48.601695 I | raft: newRaft 7197a8a131176261 [peers: [7197a8a131176261,8dad2ab19285a523,f4ef939dc36d31ee], term: 8268, commit: 28073, applied: 27901, lastindex: 28073, lastterm: 8268]
2017-06-27 23:11:48.602027 I | etcdserver/api: enabled capabilities for version 2.2
2017-06-27 23:11:48.602246 I | etcdserver/membership: added member 7197a8a131176261 [http://172.16.6.44:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:48.602434 I | etcdserver/membership: added member 8dad2ab19285a523 [http://172.16.6.218:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:48.602605 I | etcdserver/membership: added member f4ef939dc36d31ee [http://172.16.6.231:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:48.602778 I | etcdserver/membership: set the cluster version to 2.2 from store
2017-06-27 23:11:48.614716 I | rafthttp: starting peer 8dad2ab19285a523...
2017-06-27 23:11:48.614969 I | rafthttp: started HTTP pipelining with peer 8dad2ab19285a523
2017-06-27 23:11:48.619506 I | rafthttp: started streaming with peer 8dad2ab19285a523 (writer)
2017-06-27 23:11:48.619778 I | rafthttp: started streaming with peer 8dad2ab19285a523 (writer)
2017-06-27 23:11:48.623884 I | rafthttp: started peer 8dad2ab19285a523
2017-06-27 23:11:48.624124 I | rafthttp: added peer 8dad2ab19285a523
2017-06-27 23:11:48.624317 I | rafthttp: starting peer f4ef939dc36d31ee...
2017-06-27 23:11:48.624513 I | rafthttp: started HTTP pipelining with peer f4ef939dc36d31ee
2017-06-27 23:11:48.628192 I | rafthttp: started streaming with peer 8dad2ab19285a523 (stream MsgApp v2 reader)
2017-06-27 23:11:48.628922 I | rafthttp: started streaming with peer 8dad2ab19285a523 (stream Message reader)
2017-06-27 23:11:48.632522 I | rafthttp: started peer f4ef939dc36d31ee
2017-06-27 23:11:48.632756 I | rafthttp: added peer f4ef939dc36d31ee
2017-06-27 23:11:48.632965 I | etcdserver: starting server... [version: 3.1.7, cluster version: 2.2]
2017-06-27 23:11:48.637212 I | rafthttp: started streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:11:48.637449 I | rafthttp: started streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:11:48.637650 I | rafthttp: started streaming with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:11:48.638040 I | rafthttp: started streaming with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:11:48.639383 I | rafthttp: peer f4ef939dc36d31ee became active
2017-06-27 23:11:48.639607 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream Message writer)
2017-06-27 23:11:48.644023 N | etcdserver/membership: updated the cluster version from 2.2 to 3.1
2017-06-27 23:11:48.644349 I | etcdserver/api: enabled capabilities for version 3.1
2017-06-27 23:11:48.656758 I | raft: 7197a8a131176261 [term: 8268] received a MsgHeartbeat message with higher term from 8dad2ab19285a523 [term: 8293]
2017-06-27 23:11:48.656991 I | raft: 7197a8a131176261 became follower at term 8293
2017-06-27 23:11:48.657185 I | raft: raft.node: 7197a8a131176261 elected leader 8dad2ab19285a523 at term 8293
2017-06-27 23:11:48.660751 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 writer)
2017-06-27 23:11:48.661920 I | rafthttp: peer 8dad2ab19285a523 became active
2017-06-27 23:11:48.662161 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream Message writer)
2017-06-27 23:11:48.663083 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream MsgApp v2 writer)
2017-06-27 23:11:48.663792 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream Message reader)
2017-06-27 23:11:48.664319 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:11:48.664665 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream MsgApp v2 reader)
2017-06-27 23:11:48.665840 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:11:48.677958 I | etcdserver: published {Name:node0 ClientURLs:[http://172.16.6.44:2379]} to cluster 199f79acd686ab9e
2017-06-27 23:11:48.678438 E | etcdmain: forgot to set Type=notify in systemd service file?
2017-06-27 23:11:48.678648 I | embed: ready to serve client requests
2017-06-27 23:11:48.679395 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
2017-06-27 23:11:48.679785 I | embed: ready to serve client requests
2017-06-27 23:11:48.680355 N | embed: serving insecure client requests on 172.16.6.44:2379, this is strongly discouraged!
2017-06-27 23:13:41.401539 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:13:41.403288 E | rafthttp: failed to read f4ef939dc36d31ee on stream MsgApp v2 (unexpected EOF)
2017-06-27 23:13:41.403759 I | rafthttp: peer f4ef939dc36d31ee became inactive
2017-06-27 23:13:41.404197 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:13:46.971589 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 writer)
2017-06-27 23:13:46.972848 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream Message writer)
2017-06-27 23:13:53.640409 W | rafthttp: health check for peer f4ef939dc36d31ee could not connect: dial tcp 172.16.6.231:2380: getsockopt: connection refused
2017-06-27 23:13:58.641104 W | rafthttp: health check for peer f4ef939dc36d31ee could not connect: dial tcp 172.16.6.231:2380: getsockopt: connection refused
^C2017-06-27 23:14:01.573294 N | pkg/osutil: received interrupt signal, shutting down...
2017-06-27 23:14:01.573598 I | etcdserver: skipped leadership transfer for stopping non-leader member
2017-06-27 23:14:01.573797 I | rafthttp: stopping peer 8dad2ab19285a523...
2017-06-27 23:14:01.575409 I | rafthttp: closed the TCP streaming connection with peer 8dad2ab19285a523 (stream MsgApp v2 writer)
2017-06-27 23:14:01.576498 I | rafthttp: stopped streaming with peer 8dad2ab19285a523 (writer)
2017-06-27 23:14:01.577129 I | rafthttp: closed the TCP streaming connection with peer 8dad2ab19285a523 (stream Message writer)
2017-06-27 23:14:01.577214 I | rafthttp: stopped streaming with peer 8dad2ab19285a523 (writer)
2017-06-27 23:14:01.577448 I | rafthttp: stopped HTTP pipelining with peer 8dad2ab19285a523
2017-06-27 23:14:01.577664 W | rafthttp: lost the TCP streaming connection with peer 8dad2ab19285a523 (stream MsgApp v2 reader)
2017-06-27 23:14:01.577751 E | rafthttp: failed to read 8dad2ab19285a523 on stream MsgApp v2 (net/http: request canceled)
2017-06-27 23:14:01.577884 I | rafthttp: peer 8dad2ab19285a523 became inactive
2017-06-27 23:14:01.578025 I | rafthttp: stopped streaming with peer 8dad2ab19285a523 (stream MsgApp v2 reader)
2017-06-27 23:14:01.578244 W | rafthttp: lost the TCP streaming connection with peer 8dad2ab19285a523 (stream Message reader)
2017-06-27 23:14:01.578335 I | rafthttp: stopped streaming with peer 8dad2ab19285a523 (stream Message reader)
2017-06-27 23:14:01.578378 I | rafthttp: stopped peer 8dad2ab19285a523
2017-06-27 23:14:01.578453 I | rafthttp: stopping peer f4ef939dc36d31ee...
2017-06-27 23:14:01.578529 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:14:01.578575 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:14:01.578659 I | rafthttp: stopped HTTP pipelining with peer f4ef939dc36d31ee
2017-06-27 23:14:01.578735 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:14:01.578863 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:14:01.578956 I | rafthttp: stopped peer f4ef939dc36d31ee
2017-06-27 23:11:15.343314 I | etcdmain: etcd Version: 3.1.7
2017-06-27 23:11:15.343832 I | etcdmain: Git SHA: 43b7507
2017-06-27 23:11:15.344085 I | etcdmain: Go Version: go1.7.5
2017-06-27 23:11:15.344318 I | etcdmain: Go OS/Arch: linux/amd64
2017-06-27 23:11:15.344552 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
2017-06-27 23:11:15.344697 W | etcdmain: no data-dir provided, using default data-dir ./node1.etcd
2017-06-27 23:11:15.344813 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-06-27 23:11:15.344937 I | embed: listening for peers on http://172.16.6.218:2380
2017-06-27 23:11:15.345078 I | embed: listening for client requests on 127.0.0.1:2379
2017-06-27 23:11:15.345174 I | embed: listening for client requests on 172.16.6.218:2379
2017-06-27 23:11:15.348164 I | etcdserver: recovered store from snapshot at index 20002
2017-06-27 23:11:15.348257 I | etcdserver: name = node1
2017-06-27 23:11:15.348351 I | etcdserver: data dir = node1.etcd
2017-06-27 23:11:15.348418 I | etcdserver: member dir = node1.etcd/member
2017-06-27 23:11:15.348455 I | etcdserver: heartbeat = 100ms
2017-06-27 23:11:15.348516 I | etcdserver: election = 1000ms
2017-06-27 23:11:15.348617 I | etcdserver: snapshot count = 10000
2017-06-27 23:11:15.348658 I | etcdserver: advertise client URLs = http://172.16.6.218:2379
2017-06-27 23:11:15.416971 I | etcdserver: restarting member 8dad2ab19285a523 in cluster 199f79acd686ab9e at commit index 28073
2017-06-27 23:11:15.417693 I | raft: 8dad2ab19285a523 became follower at term 8285
2017-06-27 23:11:15.417927 I | raft: newRaft 8dad2ab19285a523 [peers: [7197a8a131176261,8dad2ab19285a523,f4ef939dc36d31ee], term: 8285, commit: 28073, applied: 20002, lastindex: 28073, lastterm: 8268]
2017-06-27 23:11:15.418278 I | etcdserver/api: enabled capabilities for version 2.2
2017-06-27 23:11:15.418500 I | etcdserver/membership: added member 8dad2ab19285a523 [http://172.16.6.218:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:15.418714 I | etcdserver/membership: added member f4ef939dc36d31ee [http://172.16.6.231:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:15.418908 I | etcdserver/membership: added member 7197a8a131176261 [http://172.16.6.44:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:15.419130 I | etcdserver/membership: set the cluster version to 2.2 from store
2017-06-27 23:11:15.430714 I | rafthttp: starting peer 7197a8a131176261...
2017-06-27 23:11:15.431157 I | rafthttp: started HTTP pipelining with peer 7197a8a131176261
2017-06-27 23:11:15.433935 I | rafthttp: started streaming with peer 7197a8a131176261 (writer)
2017-06-27 23:11:15.446466 I | rafthttp: started peer 7197a8a131176261
2017-06-27 23:11:15.446900 I | rafthttp: added peer 7197a8a131176261
2017-06-27 23:11:15.447183 I | rafthttp: starting peer f4ef939dc36d31ee...
2017-06-27 23:11:15.447414 I | rafthttp: started HTTP pipelining with peer f4ef939dc36d31ee
2017-06-27 23:11:15.451222 I | rafthttp: started peer f4ef939dc36d31ee
2017-06-27 23:11:15.451492 I | rafthttp: added peer f4ef939dc36d31ee
2017-06-27 23:11:15.451737 I | etcdserver: starting server... [version: 3.1.7, cluster version: 2.2]
2017-06-27 23:11:15.453691 I | rafthttp: started streaming with peer 7197a8a131176261 (writer)
2017-06-27 23:11:15.453979 I | rafthttp: started streaming with peer 7197a8a131176261 (stream MsgApp v2 reader)
2017-06-27 23:11:15.454546 I | rafthttp: started streaming with peer 7197a8a131176261 (stream Message reader)
2017-06-27 23:11:15.455216 I | rafthttp: started streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:11:15.455450 I | rafthttp: started streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:11:15.455516 I | rafthttp: started streaming with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:11:15.455769 I | rafthttp: started streaming with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:11:15.491478 N | etcdserver/membership: updated the cluster version from 2.2 to 3.1
2017-06-27 23:11:15.491811 I | etcdserver/api: enabled capabilities for version 3.1
2017-06-27 23:11:15.523036 I | raft: 8dad2ab19285a523 is starting a new election at term 8285
2017-06-27 23:11:15.523425 I | raft: 8dad2ab19285a523 became candidate at term 8286
2017-06-27 23:11:15.523514 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8286
2017-06-27 23:11:15.523817 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8286
2017-06-27 23:11:15.523912 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8286
2017-06-27 23:11:16.720232 I | raft: 8dad2ab19285a523 is starting a new election at term 8286
2017-06-27 23:11:16.720765 I | raft: 8dad2ab19285a523 became candidate at term 8287
2017-06-27 23:11:16.721166 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8287
2017-06-27 23:11:16.721552 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8287
2017-06-27 23:11:16.721937 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8287
2017-06-27 23:11:18.019580 I | raft: 8dad2ab19285a523 is starting a new election at term 8287
2017-06-27 23:11:18.020038 I | raft: 8dad2ab19285a523 became candidate at term 8288
2017-06-27 23:11:18.020364 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8288
2017-06-27 23:11:18.020661 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8288
2017-06-27 23:11:18.020950 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8288
2017-06-27 23:11:19.619624 I | raft: 8dad2ab19285a523 is starting a new election at term 8288
2017-06-27 23:11:19.620188 I | raft: 8dad2ab19285a523 became candidate at term 8289
2017-06-27 23:11:19.620437 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8289
2017-06-27 23:11:19.620672 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8289
2017-06-27 23:11:19.620907 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8289
2017-06-27 23:11:20.455408 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:20.456122 W | rafthttp: health check for peer f4ef939dc36d31ee could not connect: dial tcp 172.16.6.231:2380: getsockopt: connection refused
2017-06-27 23:11:21.219659 I | raft: 8dad2ab19285a523 is starting a new election at term 8289
2017-06-27 23:11:21.220155 I | raft: 8dad2ab19285a523 became candidate at term 8290
2017-06-27 23:11:21.220333 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8290
2017-06-27 23:11:21.220365 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8290
2017-06-27 23:11:21.220388 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8290
2017-06-27 23:11:22.219547 I | raft: 8dad2ab19285a523 is starting a new election at term 8290
2017-06-27 23:11:22.219915 I | raft: 8dad2ab19285a523 became candidate at term 8291
2017-06-27 23:11:22.220177 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8291
2017-06-27 23:11:22.220426 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8291
2017-06-27 23:11:22.220683 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8291
2017-06-27 23:11:22.456436 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:11:23.419610 I | raft: 8dad2ab19285a523 is starting a new election at term 8291
2017-06-27 23:11:23.420188 I | raft: 8dad2ab19285a523 became candidate at term 8292
2017-06-27 23:11:23.420703 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8292
2017-06-27 23:11:23.421218 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8292
2017-06-27 23:11:23.421903 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8292
2017-06-27 23:11:23.913429 I | rafthttp: peer f4ef939dc36d31ee became active
2017-06-27 23:11:23.913906 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:11:23.916589 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:11:23.963605 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream Message writer)
2017-06-27 23:11:23.964495 I | rafthttp: established a TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 writer)
2017-06-27 23:11:24.384255 I | raft: 8dad2ab19285a523 [term: 8292] ignored a MsgVote message with lower term from f4ef939dc36d31ee [term: 8268]
2017-06-27 23:11:24.419566 I | raft: 8dad2ab19285a523 is starting a new election at term 8292
2017-06-27 23:11:24.420062 I | raft: 8dad2ab19285a523 became candidate at term 8293
2017-06-27 23:11:24.420502 I | raft: 8dad2ab19285a523 received MsgVoteResp from 8dad2ab19285a523 at term 8293
2017-06-27 23:11:24.421088 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to 7197a8a131176261 at term 8293
2017-06-27 23:11:24.421638 I | raft: 8dad2ab19285a523 [logterm: 8268, index: 28073] sent MsgVote request to f4ef939dc36d31ee at term 8293
2017-06-27 23:11:24.431682 I | raft: 8dad2ab19285a523 received MsgVoteResp from f4ef939dc36d31ee at term 8293
2017-06-27 23:11:24.431856 I | raft: 8dad2ab19285a523 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
2017-06-27 23:11:24.432042 I | raft: 8dad2ab19285a523 became leader at term 8293
2017-06-27 23:11:24.432172 I | raft: raft.node: 8dad2ab19285a523 elected leader 8dad2ab19285a523 at term 8293
2017-06-27 23:11:25.456156 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:25.456777 W | rafthttp: health check for peer f4ef939dc36d31ee could not connect: dial tcp 172.16.6.231:2380: getsockopt: connection refused
2017-06-27 23:11:27.504921 W | etcdserver: failed to reach the peerURL(http://172.16.6.44:2380) of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:27.505487 W | etcdserver: cannot get the version of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:29.457081 E | etcdserver: publish error: etcdserver: request timed out, possibly due to previous leader failure
2017-06-27 23:11:30.456541 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:31.509035 W | etcdserver: failed to reach the peerURL(http://172.16.6.44:2380) of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:31.509644 W | etcdserver: cannot get the version of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:35.457818 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:35.512232 W | etcdserver: failed to reach the peerURL(http://172.16.6.44:2380) of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:35.512780 W | etcdserver: cannot get the version of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:36.457607 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:11:39.515584 W | etcdserver: failed to reach the peerURL(http://172.16.6.44:2380) of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:39.516134 W | etcdserver: cannot get the version of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:40.458634 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:43.458384 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:11:43.518598 W | etcdserver: failed to reach the peerURL(http://172.16.6.44:2380) of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:43.518845 W | etcdserver: cannot get the version of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:45.459569 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:47.522944 W | etcdserver: failed to reach the peerURL(http://172.16.6.44:2380) of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:47.523873 W | etcdserver: cannot get the version of member 7197a8a131176261 (Get http://172.16.6.44:2380/version: dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:11:48.676614 I | rafthttp: peer 7197a8a131176261 became active
2017-06-27 23:11:48.678476 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream Message reader)
2017-06-27 23:11:48.680065 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream MsgApp v2 reader)
2017-06-27 23:11:48.681137 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream Message writer)
2017-06-27 23:11:48.681838 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream MsgApp v2 writer)
2017-06-27 23:11:48.690140 I | etcdserver: published {Name:node1 ClientURLs:[http://172.16.6.218:2379]} to cluster 199f79acd686ab9e
2017-06-27 23:11:48.690386 E | etcdmain: forgot to set Type=notify in systemd service file?
2017-06-27 23:11:48.691148 I | embed: ready to serve client requests
2017-06-27 23:11:48.692284 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
2017-06-27 23:11:48.693240 I | embed: ready to serve client requests
2017-06-27 23:11:48.693920 N | embed: serving insecure client requests on 172.16.6.218:2379, this is strongly discouraged!
2017-06-27 23:11:50.460548 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:13:41.415896 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:13:41.417949 E | rafthttp: failed to read f4ef939dc36d31ee on stream MsgApp v2 (unexpected EOF)
2017-06-27 23:13:41.418365 I | rafthttp: peer f4ef939dc36d31ee became inactive
2017-06-27 23:13:41.418789 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:13:41.720063 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream Message writer)
2017-06-27 23:13:44.784127 W | etcdserver: failed to reach the peerURL(http://172.16.6.231:2380) of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:13:44.784615 W | etcdserver: cannot get the version of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:13:47.123220 W | rafthttp: lost the TCP streaming connection with peer f4ef939dc36d31ee (stream MsgApp v2 writer)
2017-06-27 23:13:48.787078 W | etcdserver: failed to reach the peerURL(http://172.16.6.231:2380) of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:13:48.787763 W | etcdserver: cannot get the version of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:13:52.790134 W | etcdserver: failed to reach the peerURL(http://172.16.6.231:2380) of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:13:52.790616 W | etcdserver: cannot get the version of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:13:56.793320 W | etcdserver: failed to reach the peerURL(http://172.16.6.231:2380) of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:13:56.793837 W | etcdserver: cannot get the version of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:14:00.459080 W | rafthttp: health check for peer f4ef939dc36d31ee could not connect: dial tcp 172.16.6.231:2380: getsockopt: connection refused
2017-06-27 23:14:00.795799 W | etcdserver: failed to reach the peerURL(http://172.16.6.231:2380) of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:14:00.796140 W | etcdserver: cannot get the version of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)
2017-06-27 23:14:01.589973 W | rafthttp: lost the TCP streaming connection with peer 7197a8a131176261 (stream MsgApp v2 reader)
2017-06-27 23:14:01.591749 W | rafthttp: lost the TCP streaming connection with peer 7197a8a131176261 (stream Message reader)
2017-06-27 23:14:01.691633 E | rafthttp: failed to dial 7197a8a131176261 on stream MsgApp v2 (dial tcp 172.16.6.44:2380: getsockopt: connection refused)
2017-06-27 23:14:01.692295 I | rafthttp: peer 7197a8a131176261 became inactive
2017-06-27 23:14:01.922278 W | rafthttp: lost the TCP streaming connection with peer 7197a8a131176261 (stream Message writer)
2017-06-27 23:14:03.419560 W | raft: 8dad2ab19285a523 stepped down to follower since quorum is not active
2017-06-27 23:14:03.419943 I | raft: 8dad2ab19285a523 became follower at term 8293
2017-06-27 23:14:03.420204 I | raft: raft.node: 8dad2ab19285a523 lost leader 8dad2ab19285a523 at term 8293
^C2017-06-27 23:14:04.002920 N | pkg/osutil: received interrupt signal, shutting down...
2017-06-27 23:14:04.003455 I | etcdserver: skipped leadership transfer for stopping non-leader member
2017-06-27 23:14:04.004088 I | rafthttp: stopping peer 7197a8a131176261...
2017-06-27 23:14:04.008962 I | rafthttp: closed the TCP streaming connection with peer 7197a8a131176261 (stream MsgApp v2 writer)
2017-06-27 23:14:04.009512 I | rafthttp: stopped streaming with peer 7197a8a131176261 (writer)
2017-06-27 23:14:04.009761 I | rafthttp: stopped streaming with peer 7197a8a131176261 (writer)
2017-06-27 23:14:04.009895 I | rafthttp: stopped HTTP pipelining with peer 7197a8a131176261
2017-06-27 23:14:04.010015 I | rafthttp: stopped streaming with peer 7197a8a131176261 (stream MsgApp v2 reader)
2017-06-27 23:14:04.010144 I | rafthttp: stopped streaming with peer 7197a8a131176261 (stream Message reader)
2017-06-27 23:14:04.010258 I | rafthttp: stopped peer 7197a8a131176261
2017-06-27 23:14:04.010373 I | rafthttp: stopping peer f4ef939dc36d31ee...
2017-06-27 23:14:04.010542 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:14:04.010669 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (writer)
2017-06-27 23:14:04.010751 I | rafthttp: stopped HTTP pipelining with peer f4ef939dc36d31ee
2017-06-27 23:14:04.010794 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (stream MsgApp v2 reader)
2017-06-27 23:14:04.010866 I | rafthttp: stopped streaming with peer f4ef939dc36d31ee (stream Message reader)
2017-06-27 23:14:04.010924 I | rafthttp: stopped peer f4ef939dc36d31ee
2017-06-27 23:11:23.795642 I | etcdmain: etcd Version: 3.1.7
2017-06-27 23:11:23.796315 I | etcdmain: Git SHA: 43b7507
2017-06-27 23:11:23.796585 I | etcdmain: Go Version: go1.7.5
2017-06-27 23:11:23.796830 I | etcdmain: Go OS/Arch: linux/amd64
2017-06-27 23:11:23.797121 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
2017-06-27 23:11:23.797390 W | etcdmain: no data-dir provided, using default data-dir ./node2.etcd
2017-06-27 23:11:23.797804 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-06-27 23:11:23.798077 I | embed: listening for peers on http://172.16.6.231:2380
2017-06-27 23:11:23.798364 I | embed: listening for client requests on 127.0.0.1:2379
2017-06-27 23:11:23.798490 I | embed: listening for client requests on 172.16.6.231:2379
2017-06-27 23:11:23.801423 I | etcdserver: recovered store from snapshot at index 20002
2017-06-27 23:11:23.801516 I | etcdserver: name = node2
2017-06-27 23:11:23.801586 I | etcdserver: data dir = node2.etcd
2017-06-27 23:11:23.801622 I | etcdserver: member dir = node2.etcd/member
2017-06-27 23:11:23.801696 I | etcdserver: heartbeat = 100ms
2017-06-27 23:11:23.801726 I | etcdserver: election = 1000ms
2017-06-27 23:11:23.801788 I | etcdserver: snapshot count = 10000
2017-06-27 23:11:23.801822 I | etcdserver: advertise client URLs = http://172.16.6.231:2379
2017-06-27 23:11:23.870143 I | etcdserver: restarting member f4ef939dc36d31ee in cluster 199f79acd686ab9e at commit index 28001
2017-06-27 23:11:23.870790 I | raft: f4ef939dc36d31ee became follower at term 8267
2017-06-27 23:11:23.871018 I | raft: newRaft f4ef939dc36d31ee [peers: [7197a8a131176261,8dad2ab19285a523,f4ef939dc36d31ee], term: 8267, commit: 28001, applied: 20002, lastindex: 28005, lastterm: 127]
2017-06-27 23:11:23.871335 I | etcdserver/api: enabled capabilities for version 2.2
2017-06-27 23:11:23.871524 I | etcdserver/membership: added member 8dad2ab19285a523 [http://172.16.6.218:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:23.871694 I | etcdserver/membership: added member f4ef939dc36d31ee [http://172.16.6.231:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:23.871858 I | etcdserver/membership: added member 7197a8a131176261 [http://172.16.6.44:2380] to cluster 199f79acd686ab9e from store
2017-06-27 23:11:23.872064 I | etcdserver/membership: set the cluster version to 2.2 from store
2017-06-27 23:11:23.880558 W | etcdserver: consistent index never saved (snapshot index=20002)
2017-06-27 23:11:23.884166 I | rafthttp: starting peer 7197a8a131176261...
2017-06-27 23:11:23.884503 I | rafthttp: started HTTP pipelining with peer 7197a8a131176261
2017-06-27 23:11:23.887001 I | rafthttp: started streaming with peer 7197a8a131176261 (writer)
2017-06-27 23:11:23.901267 I | rafthttp: started peer 7197a8a131176261
2017-06-27 23:11:23.901495 I | rafthttp: added peer 7197a8a131176261
2017-06-27 23:11:23.901710 I | rafthttp: starting peer 8dad2ab19285a523...
2017-06-27 23:11:23.901894 I | rafthttp: started HTTP pipelining with peer 8dad2ab19285a523
2017-06-27 23:11:23.905512 I | rafthttp: started peer 8dad2ab19285a523
2017-06-27 23:11:23.905750 I | rafthttp: added peer 8dad2ab19285a523
2017-06-27 23:11:23.905961 I | etcdserver: starting server... [version: 3.1.7, cluster version: 2.2]
2017-06-27 23:11:23.908000 I | rafthttp: started streaming with peer 7197a8a131176261 (writer)
2017-06-27 23:11:23.908321 I | rafthttp: started streaming with peer 7197a8a131176261 (stream MsgApp v2 reader)
2017-06-27 23:11:23.908727 I | rafthttp: started streaming with peer 7197a8a131176261 (stream Message reader)
2017-06-27 23:11:23.909198 I | rafthttp: started streaming with peer 8dad2ab19285a523 (writer)
2017-06-27 23:11:23.909449 I | rafthttp: started streaming with peer 8dad2ab19285a523 (writer)
2017-06-27 23:11:23.909622 I | rafthttp: peer 8dad2ab19285a523 became active
2017-06-27 23:11:23.909790 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream Message writer)
2017-06-27 23:11:23.909979 I | rafthttp: started streaming with peer 8dad2ab19285a523 (stream MsgApp v2 reader)
2017-06-27 23:11:23.910328 I | rafthttp: started streaming with peer 8dad2ab19285a523 (stream Message reader)
2017-06-27 23:11:23.936746 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream MsgApp v2 writer)
2017-06-27 23:11:23.960911 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream Message reader)
2017-06-27 23:11:23.961196 I | rafthttp: established a TCP streaming connection with peer 8dad2ab19285a523 (stream MsgApp v2 reader)
2017-06-27 23:11:24.372546 I | raft: f4ef939dc36d31ee is starting a new election at term 8267
2017-06-27 23:11:24.373295 I | raft: f4ef939dc36d31ee became candidate at term 8268
2017-06-27 23:11:24.373805 I | raft: f4ef939dc36d31ee received MsgVoteResp from f4ef939dc36d31ee at term 8268
2017-06-27 23:11:24.374264 I | raft: f4ef939dc36d31ee [logterm: 127, index: 28005] sent MsgVote request to 7197a8a131176261 at term 8268
2017-06-27 23:11:24.374928 I | raft: f4ef939dc36d31ee [logterm: 127, index: 28005] sent MsgVote request to 8dad2ab19285a523 at term 8268
2017-06-27 23:11:24.421785 I | raft: f4ef939dc36d31ee [term: 8268] received a MsgVote message with higher term from 8dad2ab19285a523 [term: 8293]
2017-06-27 23:11:24.422173 I | raft: f4ef939dc36d31ee became follower at term 8293
2017-06-27 23:11:24.422569 I | raft: f4ef939dc36d31ee [logterm: 127, index: 28005, vote: 0] cast MsgVote for 8dad2ab19285a523 [logterm: 8268, index: 28073] at term 8293
2017-06-27 23:11:24.428217 I | raft: raft.node: f4ef939dc36d31ee elected leader 8dad2ab19285a523 at term 8293
2017-06-27 23:11:28.909578 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:30.911183 E | etcdserver: publish error: etcdserver: request timed out, possibly due to previous leader failure
2017-06-27 23:11:33.910520 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:37.911988 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:11:38.911377 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:43.911973 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:44.912674 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:11:48.648436 I | rafthttp: peer 7197a8a131176261 became active
2017-06-27 23:11:48.648931 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream Message reader)
2017-06-27 23:11:48.658265 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream MsgApp v2 reader)
2017-06-27 23:11:48.673748 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream MsgApp v2 writer)
2017-06-27 23:11:48.674916 I | rafthttp: established a TCP streaming connection with peer 7197a8a131176261 (stream Message writer)
2017-06-27 23:11:48.913089 W | rafthttp: health check for peer 7197a8a131176261 could not connect: dial tcp 172.16.6.44:2380: getsockopt: connection refused
2017-06-27 23:11:51.913211 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:11:58.913970 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:05.914561 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:12.915078 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:19.916669 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:26.917412 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:33.919060 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:40.919498 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:47.919969 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:12:54.920523 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:13:01.921185 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:13:08.922134 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:13:15.922938 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:13:22.923644 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:13:29.924303 E | etcdserver: publish error: etcdserver: request timed out
2017-06-27 23:13:36.924768 E | etcdserver: publish error: etcdserver: request timed out
xiang90 commented 7 years ago

You should figure this out on your side.

2017-06-27 23:13:56.793320 W | etcdserver: failed to reach the peerURL(http://172.16.6.231:2380) of member f4ef939dc36d31ee (Get http://172.16.6.231:2380/version: dial tcp 172.16.6.231:2380: getsockopt: connection refused)

As etcd reports, there is a network issue. it might be etcd configuration related or a real network problem. but we are not sure.

DJAKN commented 7 years ago

@xiang90 Hi. It seems that port 2380 of 172.16.6.231 is blocked according to this information but I used netstat -anp | grep 2380 to check it and no result returns. So could you please give me some other advice on checking the network situation?

heyitsanthony commented 7 years ago

@DJAKN Can you confirm there aren't any firewall or network rules blocking port 2380? It can be tested with nc -l 2380 and echo "abc" | nc 172.16.6.231 2380.

DJAKN commented 7 years ago

@heyitsanthony Hello, I'm out of access to the servers these days and I'll try it once I get the access.

debMan commented 3 years ago

I have the same issue and challenging 12 hours with no answer. Anybody can guide me to solve this? I don't want to lose etcd data.

Explaining the scenario:

All of my nodes have two network interfaces, one with internet access which is not in the same subnet over nodes (e.g. one node with 192.168.101.115/24 another with 192.168.100.115/24) and another NIC for internal network which k8s works on and is 192.168.10.0/24. I was adding a new k8s control-plane node with the following host info:: IPs: 192.168.100.115 (Gateway), 192.168.10.50 (Internal)

kubeadm join 192.168.10.37:6443  --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH> --control-plane --certificate-key <KEY>

after a while, the new control-plane did not started and my former master (192.168.10.37) went down. My default and former master etcd wants to connect to the incorrect IP address of new node.

It was my fault that not introduced the internal IP address to kubeadm.

Now, etcd daemon not start and I can't run etcdctl to remove the corrupted node.

logs

etcd container logs:

$ docker container  logs -f 26cd5ca1a05e
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2021-03-02 11:10:09.875591 I | etcdmain: etcd Version: 3.4.3
2021-03-02 11:10:09.875653 I | etcdmain: Git SHA: 3cf2f69b5
2021-03-02 11:10:09.875660 I | etcdmain: Go Version: go1.12.12
2021-03-02 11:10:09.875666 I | etcdmain: Go OS/Arch: linux/amd64
2021-03-02 11:10:09.875675 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2021-03-02 11:10:09.875798 N | etcdmain: the server is already initialized as member before, starting as etcd member...
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2021-03-02 11:10:09.875871 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file =
2021-03-02 11:10:09.877517 I | embed: name = master
2021-03-02 11:10:09.877534 I | embed: data dir = /var/lib/etcd
2021-03-02 11:10:09.877541 I | embed: member dir = /var/lib/etcd/member
2021-03-02 11:10:09.877548 I | embed: heartbeat = 100ms
2021-03-02 11:10:09.877553 I | embed: election = 1000ms
2021-03-02 11:10:09.877559 I | embed: snapshot count = 10000
2021-03-02 11:10:09.877592 I | embed: advertise client URLs = https://192.168.10.37:2379
2021-03-02 11:10:09.877603 I | embed: initial advertise peer URLs = https://192.168.10.37:2380
2021-03-02 11:10:09.877612 I | embed: initial cluster =
2021-03-02 11:10:09.894496 I | etcdserver: recovered store from snapshot at index 69617007
2021-03-02 11:10:09.895247 I | mvcc: restore compact to 62843470
2021-03-02 11:10:10.051893 I | etcdserver: restarting member 9e0a44ecc696ddc2 in cluster 73b3bac0ebc7dd14 at commit index 69621423
raft2021/03/02 11:10:10 INFO: 9e0a44ecc696ddc2 switched to configuration voters=(11387990391494467010)
raft2021/03/02 11:10:10 INFO: 9e0a44ecc696ddc2 became follower at term 34899
raft2021/03/02 11:10:10 INFO: newRaft 9e0a44ecc696ddc2 [peers: [9e0a44ecc696ddc2], term: 34899, commit: 69621423, applied: 69617007, lastindex: 69621430, lastterm: 17]
2021-03-02 11:10:10.052348 I | etcdserver/api: enabled capabilities for version 3.4
2021-03-02 11:10:10.052364 I | etcdserver/membership: added member 9e0a44ecc696ddc2 [https://192.168.10.37:2380] to cluster 73b3bac0ebc7dd14 from store
2021-03-02 11:10:10.052371 I | etcdserver/membership: set the cluster version to 3.4 from store
2021-03-02 11:10:10.054286 I | mvcc: restore compact to 62843470
2021-03-02 11:10:10.070533 W | auth: simple token is not cryptographically signed
2021-03-02 11:10:10.072433 I | etcdserver: starting server... [version: 3.4.3, cluster version: 3.4]
2021-03-02 11:10:10.072567 I | etcdserver: 9e0a44ecc696ddc2 as single-node; fast-forwarding 9 ticks (election ticks 10)
2021-03-02 11:10:10.083095 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file =
2021-03-02 11:10:10.083385 I | embed: listening for peers on 192.168.10.37:2380
2021-03-02 11:10:10.084283 I | embed: listening for metrics on http://127.0.0.1:2381
raft2021/03/02 11:10:10 INFO: 9e0a44ecc696ddc2 switched to configuration voters=(3733124607463646867 11387990391494467010)
2021-03-02 11:10:10.102706 I | etcdserver/membership: added member 33ceb90d3223e693 [https://192.168.100.115:2380] to cluster 73b3bac0ebc7dd14
2021-03-02 11:10:10.102746 I | rafthttp: starting peer 33ceb90d3223e693...
2021-03-02 11:10:10.102781 I | rafthttp: started HTTP pipelining with peer 33ceb90d3223e693
2021-03-02 11:10:10.103751 I | rafthttp: started streaming with peer 33ceb90d3223e693 (writer)
2021-03-02 11:10:10.104976 I | rafthttp: started streaming with peer 33ceb90d3223e693 (writer)
2021-03-02 11:10:10.113839 I | rafthttp: started peer 33ceb90d3223e693
2021-03-02 11:10:10.113872 I | rafthttp: started streaming with peer 33ceb90d3223e693 (stream MsgApp v2 reader)
2021-03-02 11:10:10.113902 I | rafthttp: added peer 33ceb90d3223e693
2021-03-02 11:10:10.114010 I | rafthttp: started streaming with peer 33ceb90d3223e693 (stream Message reader)
raft2021/03/02 11:10:10 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34899
raft2021/03/02 11:10:10 INFO: 9e0a44ecc696ddc2 became candidate at term 34900
raft2021/03/02 11:10:10 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34900
raft2021/03/02 11:10:10 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34900
raft2021/03/02 11:10:12 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34900
raft2021/03/02 11:10:12 INFO: 9e0a44ecc696ddc2 became candidate at term 34901
raft2021/03/02 11:10:12 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34901
raft2021/03/02 11:10:12 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34901
raft2021/03/02 11:10:13 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34901
raft2021/03/02 11:10:13 INFO: 9e0a44ecc696ddc2 became candidate at term 34902
raft2021/03/02 11:10:13 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34902
raft2021/03/02 11:10:13 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34902
raft2021/03/02 11:10:14 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34902
raft2021/03/02 11:10:14 INFO: 9e0a44ecc696ddc2 became candidate at term 34903
raft2021/03/02 11:10:14 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34903
raft2021/03/02 11:10:14 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34903
2021-03-02 11:10:15.114418 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:15.114484 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
raft2021/03/02 11:10:16 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34903
raft2021/03/02 11:10:16 INFO: 9e0a44ecc696ddc2 became candidate at term 34904
raft2021/03/02 11:10:16 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34904
raft2021/03/02 11:10:16 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34904
2021-03-02 11:10:17.075165 E | etcdserver: publish error: etcdserver: request timed out
raft2021/03/02 11:10:17 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34904
raft2021/03/02 11:10:17 INFO: 9e0a44ecc696ddc2 became candidate at term 34905
raft2021/03/02 11:10:17 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34905
raft2021/03/02 11:10:17 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34905
raft2021/03/02 11:10:19 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34905
raft2021/03/02 11:10:19 INFO: 9e0a44ecc696ddc2 became candidate at term 34906
raft2021/03/02 11:10:19 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34906
raft2021/03/02 11:10:19 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34906
2021-03-02 11:10:20.114714 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:20.114874 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
raft2021/03/02 11:10:20 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34906
raft2021/03/02 11:10:20 INFO: 9e0a44ecc696ddc2 became candidate at term 34907
raft2021/03/02 11:10:20 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34907
raft2021/03/02 11:10:20 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34907
raft2021/03/02 11:10:22 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34907
raft2021/03/02 11:10:22 INFO: 9e0a44ecc696ddc2 became candidate at term 34908
raft2021/03/02 11:10:22 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34908
raft2021/03/02 11:10:22 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34908
raft2021/03/02 11:10:23 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34908
raft2021/03/02 11:10:23 INFO: 9e0a44ecc696ddc2 became candidate at term 34909
raft2021/03/02 11:10:23 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34909
raft2021/03/02 11:10:23 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34909
2021-03-02 11:10:24.075444 E | etcdserver: publish error: etcdserver: request timed out
raft2021/03/02 11:10:24 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34909
raft2021/03/02 11:10:24 INFO: 9e0a44ecc696ddc2 became candidate at term 34910
raft2021/03/02 11:10:24 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34910
raft2021/03/02 11:10:24 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34910
2021-03-02 11:10:25.115166 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:25.115637 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
raft2021/03/02 11:10:25 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34910
raft2021/03/02 11:10:25 INFO: 9e0a44ecc696ddc2 became candidate at term 34911
raft2021/03/02 11:10:25 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34911
raft2021/03/02 11:10:25 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34911
raft2021/03/02 11:10:26 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34911
raft2021/03/02 11:10:26 INFO: 9e0a44ecc696ddc2 became candidate at term 34912
raft2021/03/02 11:10:26 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34912
raft2021/03/02 11:10:26 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34912
raft2021/03/02 11:10:28 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34912
raft2021/03/02 11:10:28 INFO: 9e0a44ecc696ddc2 became candidate at term 34913
raft2021/03/02 11:10:28 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34913
raft2021/03/02 11:10:28 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34913
raft2021/03/02 11:10:29 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34913
raft2021/03/02 11:10:29 INFO: 9e0a44ecc696ddc2 became candidate at term 34914
raft2021/03/02 11:10:29 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34914
raft2021/03/02 11:10:29 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34914
2021-03-02 11:10:30.115382 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:30.115969 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:31.075663 E | etcdserver: publish error: etcdserver: request timed out
raft2021/03/02 11:10:31 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34914
raft2021/03/02 11:10:31 INFO: 9e0a44ecc696ddc2 became candidate at term 34915
raft2021/03/02 11:10:31 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34915
raft2021/03/02 11:10:31 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34915
raft2021/03/02 11:10:32 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34915
raft2021/03/02 11:10:32 INFO: 9e0a44ecc696ddc2 became candidate at term 34916
raft2021/03/02 11:10:32 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34916
raft2021/03/02 11:10:32 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34916
raft2021/03/02 11:10:34 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34916
raft2021/03/02 11:10:34 INFO: 9e0a44ecc696ddc2 became candidate at term 34917
raft2021/03/02 11:10:34 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34917
raft2021/03/02 11:10:34 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34917
2021-03-02 11:10:35.115614 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:35.116426 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
raft2021/03/02 11:10:35 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34917
raft2021/03/02 11:10:35 INFO: 9e0a44ecc696ddc2 became candidate at term 34918
raft2021/03/02 11:10:35 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34918
raft2021/03/02 11:10:35 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34918
raft2021/03/02 11:10:37 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34918
raft2021/03/02 11:10:37 INFO: 9e0a44ecc696ddc2 became candidate at term 34919
raft2021/03/02 11:10:37 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34919
raft2021/03/02 11:10:37 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34919
2021-03-02 11:10:38.075874 E | etcdserver: publish error: etcdserver: request timed out
raft2021/03/02 11:10:39 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34919
raft2021/03/02 11:10:39 INFO: 9e0a44ecc696ddc2 became candidate at term 34920
raft2021/03/02 11:10:39 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34920
raft2021/03/02 11:10:39 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34920
2021-03-02 11:10:40.115803 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:40.116573 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
raft2021/03/02 11:10:40 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34920
raft2021/03/02 11:10:40 INFO: 9e0a44ecc696ddc2 became candidate at term 34921
raft2021/03/02 11:10:40 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34921
raft2021/03/02 11:10:40 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34921
raft2021/03/02 11:10:42 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34921
raft2021/03/02 11:10:42 INFO: 9e0a44ecc696ddc2 became candidate at term 34922
raft2021/03/02 11:10:42 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34922
raft2021/03/02 11:10:42 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34922
raft2021/03/02 11:10:44 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34922
raft2021/03/02 11:10:44 INFO: 9e0a44ecc696ddc2 became candidate at term 34923
raft2021/03/02 11:10:44 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34923
raft2021/03/02 11:10:44 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34923
2021-03-02 11:10:45.076089 E | etcdserver: publish error: etcdserver: request timed out
2021-03-02 11:10:45.116043 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
2021-03-02 11:10:45.116783 W | rafthttp: health check for peer 33ceb90d3223e693 could not connect: dial tcp 192.168.100.115:2380: connect: connection refused
raft2021/03/02 11:10:45 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34923
raft2021/03/02 11:10:45 INFO: 9e0a44ecc696ddc2 became candidate at term 34924
raft2021/03/02 11:10:45 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34924
raft2021/03/02 11:10:45 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34924
raft2021/03/02 11:10:47 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34924
raft2021/03/02 11:10:47 INFO: 9e0a44ecc696ddc2 became candidate at term 34925
raft2021/03/02 11:10:47 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34925
raft2021/03/02 11:10:47 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34925
raft2021/03/02 11:10:48 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34925
raft2021/03/02 11:10:48 INFO: 9e0a44ecc696ddc2 became candidate at term 34926
raft2021/03/02 11:10:48 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34926
raft2021/03/02 11:10:48 INFO: 9e0a44ecc696ddc2 [logterm: 17, index: 69621430] sent MsgVote request to 33ceb90d3223e693 at term 34926
raft2021/03/02 11:10:49 INFO: 9e0a44ecc696ddc2 is starting a new election at term 34926
raft2021/03/02 11:10:49 INFO: 9e0a44ecc696ddc2 became candidate at term 34927
raft2021/03/02 11:10:49 INFO: 9e0a44ecc696ddc2 received MsgVoteResp from 9e0a44ecc696ddc2 at term 34927
raft2021/03/02 11:10:49 INFO: 9e0a44ecc696

kubelet logs on main master node:


Mar 02 15:22:46 master kubelet[7920]: I0302 15:22:46.559298    7920 trace.go:116] Trace[2045619077]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/kubelet.go:449 (started: 2021-03-02 15:22:36.012105343 +0330 +0330 m=+0.221503379) (total time: 10.547129012s):
Mar 02 15:22:46 master kubelet[7920]: Trace[2045619077]: [10.547129012s] [10.547129012s] END
Mar 02 15:22:46 master kubelet[7920]: E0302 15:22:46.560152    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:46 master kubelet[7920]: I0302 15:22:46.559352    7920 trace.go:116] Trace[1343536260]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/kubelet.go:458 (started: 2021-03-02 15:22:36.013408611 +0330 +0330 m=+0.222806740) (total time: 10.545881015s):
Mar 02 15:22:46 master kubelet[7920]: Trace[1343536260]: [10.545881015s] [10.545881015s] END
Mar 02 15:22:46 master kubelet[7920]: E0302 15:22:46.561110    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:47 master kubelet[7920]: E0302 15:22:47.561785    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:47 master kubelet[7920]: E0302 15:22:47.562437    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:48 master kubelet[7920]: E0302 15:22:48.562818    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:48 master kubelet[7920]: E0302 15:22:48.563654    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:49 master kubelet[7920]: E0302 15:22:49.563743    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:49 master kubelet[7920]: E0302 15:22:49.564959    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:50 master kubelet[7920]: E0302 15:22:50.564676    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:50 master kubelet[7920]: E0302 15:22:50.565635    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:51 master kubelet[7920]: I0302 15:22:51.046482    7920 trace.go:116] Trace[53168885]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46 (started: 2021-03-02 15:22:36.011001502 +0330 +0330 m=+0.220399688) (total time: 15.035426129s):
Mar 02 15:22:51 master kubelet[7920]: Trace[53168885]: [15.035426129s] [15.035426129s] END
Mar 02 15:22:51 master kubelet[7920]: E0302 15:22:51.046534    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://192.168.10.37:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:51 master kubelet[7920]: E0302 15:22:51.565835    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:51 master kubelet[7920]: E0302 15:22:51.566710    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:52 master kubelet[7920]: E0302 15:22:52.047555    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://192.168.10.37:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:52 master kubelet[7920]: E0302 15:22:52.566800    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:52 master kubelet[7920]: E0302 15:22:52.568112    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:53 master kubelet[7920]: E0302 15:22:53.048665    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://192.168.10.37:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:53 master kubelet[7920]: E0302 15:22:53.567869    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:53 master kubelet[7920]: E0302 15:22:53.569075    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:54 master kubelet[7920]: E0302 15:22:54.049754    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://192.168.10.37:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:54 master kubelet[7920]: E0302 15:22:54.569003    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:54 master kubelet[7920]: E0302 15:22:54.570072    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:55 master kubelet[7920]: E0302 15:22:55.050837    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://192.168.10.37:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:55 master kubelet[7920]: E0302 15:22:55.570221    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:55 master kubelet[7920]: E0302 15:22:55.570987    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.051883    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://192.168.10.37:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.458338    7920 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
Mar 02 15:22:56 master kubelet[7920]:         For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.461552    7920 kuberuntime_manager.go:211] Container runtime docker initialized, version: 17.03.2-ce, apiVersion: 1.27.0
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.464027    7920 server.go:1113] Started kubelet
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.464128    7920 kubelet.go:1302] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.464593    7920 server.go:144] Starting to listen on 0.0.0.0:10250
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.464755    7920 event.go:272] Unable to write event: 'Post https://192.168.10.37:6443/api/v1/namespaces/default/events: dial tcp 192.168.10.37:6443: connect: connection refused' (may retry after sleeping)
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.466513    7920 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.467040    7920 server.go:384] Adding debug handlers to kubelet server.
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.468796    7920 volume_manager.go:265] Starting Kubelet Volume Manager
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.470525    7920 desired_state_of_world_populator.go:138] Desired state populator starts to run
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.470831    7920 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.CSIDriver: Get https://192.168.10.37:6443/apis/storage.k8s.io/v1beta1/csidrivers?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.473117    7920 controller.go:135] failed to ensure node lease exists, will retry in 200ms, error: Get https://192.168.10.37:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master?timeout=10s: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.516012    7920 status_manager.go:157] Starting to sync pod status with apiserver
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.516070    7920 kubelet.go:1820] Starting kubelet main sync loop.
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.516158    7920 kubelet.go:1844] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.518065    7920 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Get https://192.168.10.37:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.569089    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.569384    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.573820    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Failed to list *v1.Service: Get https://192.168.10.37:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.574590    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://192.168.10.37:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&limit=500&resourceVersion=0: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.575308    7920 kubelet_node_status.go:70] Attempting to register node master
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.575972    7920 kubelet_node_status.go:92] Unable to register node "master" with API server: Post https://192.168.10.37:6443/api/v1/nodes: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.616361    7920 kubelet.go:1844] skipping pod synchronization - container runtime status check may not have completed yet
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.669589    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.673954    7920 controller.go:135] failed to ensure node lease exists, will retry in 400ms, error: Get https://192.168.10.37:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master?timeout=10s: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.691822    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.696256    7920 cpu_manager.go:173] [cpumanager] starting with none policy
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.696296    7920 cpu_manager.go:174] [cpumanager] reconciling every 10s
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.696362    7920 policy_none.go:43] [cpumanager] none policy: Start
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.699502    7920 plugin_manager.go:114] Starting Kubelet Plugin Manager
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.700685    7920 eviction_manager.go:246] eviction manager: failed to get summary stats: failed to get node info: node "master" not found
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.769843    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.776231    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.779629    7920 kubelet_node_status.go:70] Attempting to register node master
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.780297    7920 kubelet_node_status.go:92] Unable to register node "master" with API server: Post https://192.168.10.37:6443/api/v1/nodes: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.798547    7920 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: Get https://192.168.10.37:6443/apis/storage.k8s.io/v1/csinodes/master: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.816745    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.822070    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.822303    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.826671    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.827323    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.830275    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.830397    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.834350    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.834476    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.837906    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: W0302 15:22:56.840461    7920 pod_container_deletor.go:75] Container "863c38aadf63b831d939ef14a9b67958dd0c4986f9a8c55de6bdb3ea448d1a0a" not found in pod's containers
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.840565    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.842903    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.845530    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.848399    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:56 master kubelet[7920]: W0302 15:22:56.850822    7920 pod_container_deletor.go:75] Container "d8bae65ef32a14675b97c886f82eab074b7e0f25d0a1244b2ed29671786ac543" not found in pod's containers
Mar 02 15:22:56 master kubelet[7920]: W0302 15:22:56.863290    7920 status_manager.go:530] Failed to get status for pod "etcd-master_kube-system(bb7dfd61500cacc435c6beaced7c529f)": Get https://192.168.10.37:6443/api/v1/namespaces/kube-system/pods/etcd-master: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.870103    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.875687    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-certs" (UniqueName: "kubernetes.io/host-path/bb7dfd61500cacc435c6beaced7c529f-etcd-certs") pod "etcd-master" (UID: "bb7dfd61500cacc435c6beaced7c529f")
Mar 02 15:22:56 master kubelet[7920]: E0302 15:22:56.970352    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.975996    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "usr-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/4339dbca8cd79e0afb7adc282b26eed0-usr-share-ca-certificates") pod "kube-controller-manager-master" (UID: "4339dbca8cd79e0afb7adc282b26eed0")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976085    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "ca-certs" (UniqueName: "kubernetes.io/host-path/d17a6f5b21c99832f8e1a5fbff5d241b-ca-certs") pod "kube-apiserver-master" (UID: "d17a6f5b21c99832f8e1a5fbff5d241b")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976219    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-certs" (UniqueName: "kubernetes.io/host-path/d17a6f5b21c99832f8e1a5fbff5d241b-k8s-certs") pod "kube-apiserver-master" (UID: "d17a6f5b21c99832f8e1a5fbff5d241b")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976326    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kubeconfig" (UniqueName: "kubernetes.io/host-path/4339dbca8cd79e0afb7adc282b26eed0-kubeconfig") pod "kube-controller-manager-master" (UID: "4339dbca8cd79e0afb7adc282b26eed0")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976403    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "usr-local-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/4339dbca8cd79e0afb7adc282b26eed0-usr-local-share-ca-certificates") pod "kube-controller-manager-master" (UID: "4339dbca8cd79e0afb7adc282b26eed0")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976472    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-certs" (UniqueName: "kubernetes.io/host-path/4339dbca8cd79e0afb7adc282b26eed0-k8s-certs") pod "kube-controller-manager-master" (UID: "4339dbca8cd79e0afb7adc282b26eed0")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976713    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "usr-local-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/d17a6f5b21c99832f8e1a5fbff5d241b-usr-local-share-ca-certificates") pod "kube-apiserver-master" (UID: "d17a6f5b21c99832f8e1a5fbff5d241b")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976806    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "etc-ca-certificates" (UniqueName: "kubernetes.io/host-path/4339dbca8cd79e0afb7adc282b26eed0-etc-ca-certificates") pod "kube-controller-manager-master" (UID: "4339dbca8cd79e0afb7adc282b26eed0")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976877    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "flexvolume-dir" (UniqueName: "kubernetes.io/host-path/4339dbca8cd79e0afb7adc282b26eed0-flexvolume-dir") pod "kube-controller-manager-master" (UID: "4339dbca8cd79e0afb7adc282b26eed0")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.976952    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-data" (UniqueName: "kubernetes.io/host-path/bb7dfd61500cacc435c6beaced7c529f-etcd-data") pod "etcd-master" (UID: "bb7dfd61500cacc435c6beaced7c529f")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.977024    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "usr-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/d17a6f5b21c99832f8e1a5fbff5d241b-usr-share-ca-certificates") pod "kube-apiserver-master" (UID: "d17a6f5b21c99832f8e1a5fbff5d241b")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.977092    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "etc-ca-certificates" (UniqueName: "kubernetes.io/host-path/d17a6f5b21c99832f8e1a5fbff5d241b-etc-ca-certificates") pod "kube-apiserver-master" (UID: "d17a6f5b21c99832f8e1a5fbff5d241b")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.977155    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "ca-certs" (UniqueName: "kubernetes.io/host-path/4339dbca8cd79e0afb7adc282b26eed0-ca-certs") pod "kube-controller-manager-master" (UID: "4339dbca8cd79e0afb7adc282b26eed0")
Mar 02 15:22:56 master kubelet[7920]: I0302 15:22:56.977220    7920 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kubeconfig" (UniqueName: "kubernetes.io/host-path/e3025acd90e7465e66fa19c71b916366-kubeconfig") pod "kube-scheduler-master" (UID: "e3025acd90e7465e66fa19c71b916366")
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.070582    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.075107    7920 controller.go:135] failed to ensure node lease exists, will retry in 800ms, error: Get https://192.168.10.37:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master?timeout=10s: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.170741    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: I0302 15:22:57.181249    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:57 master kubelet[7920]: I0302 15:22:57.184196    7920 kubelet_node_status.go:70] Attempting to register node master
Mar 02 15:22:57 master kubelet[7920]: W0302 15:22:57.263272    7920 status_manager.go:530] Failed to get status for pod "kube-apiserver-master_kube-system(d17a6f5b21c99832f8e1a5fbff5d241b)": Get https://192.168.10.37:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-master: dial tcp 192.168.10.37:6443: connect: connection refused
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.270988    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.371222    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.471423    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.571653    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: I0302 15:22:57.602357    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:57 master kubelet[7920]: I0302 15:22:57.603033    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:57 master kubelet[7920]: I0302 15:22:57.603527    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:57 master kubelet[7920]: I0302 15:22:57.605920    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.671872    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.772162    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.872337    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:57 master kubelet[7920]: E0302 15:22:57.973114    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.073393    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.174166    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.274865    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.375443    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.476027    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.576700    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: I0302 15:22:58.609000    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.676979    7920 kubelet.go:2263] node "master" not found
Mar 02 15:22:58 master kubelet[7920]: E0302 15:22:58.777282    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.004720    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.105459    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.205689    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.306497    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.406820    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.507054    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.607271    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.707488    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.808553    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:05 master kubelet[7920]: E0302 15:23:05.908810    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.009088    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: W0302 15:23:06.054033    7920 prober.go:108] No ref for container "docker://9d462b022f5a4c3cb827ed56cecebb75ad7cecbd1cb05d2f9f7c110b7a2d2c1c" (etcd-master_kube-system(bb7dfd61500cacc435c6beaced7c529f):etcd)
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.109286    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.209498    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.309747    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.409991    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.510596    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.611235    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.701009    7920 eviction_manager.go:246] eviction manager: failed to get summary stats: failed to get node info: node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.711461    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.811680    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:06 master kubelet[7920]: E0302 15:23:06.912446    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.013880    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.114658    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.215288    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.316196    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.416412    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: I0302 15:23:07.463700    7920 trace.go:116] Trace[544033399]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46 (started: 2021-03-02 15:22:57.052138394 +0330 +0330 m=+21.261536563) (total time: 10.411507459s):
Mar 02 15:23:07 master kubelet[7920]: Trace[544033399]: [10.411507459s] [10.411507459s] END
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.463744    7920 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://192.168.10.37:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.516602    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.616849    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.717140    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.817449    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.863593    7920 kubelet_node_status.go:92] Unable to register node "master" with API server: Post https://192.168.10.37:6443/api/v1/nodes: net/http: TLS handshake timeout
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.875790    7920 controller.go:135] failed to ensure node lease exists, will retry in 1.6s, error: Get https://192.168.10.37:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Mar 02 15:23:07 master kubelet[7920]: E0302 15:23:07.917770    7920 kubelet.go:2263] node "master" not found

Mar 02 15:23:08 master kubelet[7920]: E0302 15:23:08.519484    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:08 master kubelet[7920]: E0302 15:23:08.619824    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:08 master kubelet[7920]: I0302 15:23:08.664475    7920 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Mar 02 15:23:08 master kubelet[7920]: I0302 15:23:08.668485    7920 kubelet_node_status.go:70] Attempting to register node master
Mar 02 15:23:08 master kubelet[7920]: E0302 15:23:08.720582    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:08 master kubelet[7920]: E0302 15:23:08.820860    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:08 master kubelet[7920]: E0302 15:23:08.921085    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:09 master kubelet[7920]: E0302 15:23:09.021305    7920 kubelet.go:2263] node "master" not found
Mar 02 15:23:09 master kubelet[7920]: W0302 15:23:09.062769    7920 status_manager.go:530] Failed to get status for pod "kube-controller-manager-master_kube-system(4339dbca8cd79e0afb7adc282b26eed0)": Get https://192.168.10.37:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-master: net/http: TLS handshake timeout
Mar 02 15:23:09 master kubelet[7920]: E0302 15:23:09.122240    7920 kubelet.go:2263] node "master" not found
jjsty1e commented 3 years ago

@debMan Hello, have you solved this? I might come into the same problem as you

debMan commented 3 years ago

@Jaggle unfortunately no, I've re deployed the entire cluster 😔

jjsty1e commented 3 years ago

@debMan no sad man, I've reset my cluster too XD

AddoSolutions commented 1 year ago

For those that find this later, I was reduced to one, failed etcd node that wouldn't come back up because it missed its friends too much (nothing in the logs)

I went ahead and restarted with --force-new-cluster and BAM she was back! I was able to perform a backup and may be able to restore from there

Jiggy-Jag commented 1 year ago

For those who find this in the future, in my case this was caused by etcd sending a large snapshot (600MB) and the connection cuts off. I fixed this by increasing the etcd.service timeout to 900 second and just let it run, it failed to retrieve it within 900 seconds the first time, but the second time it worked just fine.

I would suggest increasing your etcd.service timeout by adding this line TimeoutSec=900 in etcd.service under [Service] Find more on how to add timeout here: https://unix.stackexchange.com/questions/227017/how-to-change-systemd-service-timeout-value

darkopetrovic commented 4 months ago

etcd server seems to fail to start due to hard drive performance.

Seeing etcd[85445]: publish error: etcdserver: request timed out when running sudo journalctl -u snap.etcd.etcd.

I was trying to install etcd on a machine with poor hard disk performance. Running iostat on the machine shows indeed some bottleneck: hight value in %util column.

$ iostat -x 1 10 sda
Linux 6.5.0-35-generic (os-compute04)   06/02/2024  _x86_64_    (24 CPU)

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              4.73    211.94     1.32  21.77   19.71    44.79   33.88    655.13    35.99  51.51   52.97    19.34    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.89  96.30

Testing on another machine:

$ iostat -x 1 10 sda
Linux 5.15.0-107-generic (os-compute09)     06/02/2024  _x86_64_    (36 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.18    0.00    0.08    0.19    0.00   99.55

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              1.10     52.90     0.24  17.80    8.10    48.01   14.38    356.53    11.43  44.30   25.75    24.80    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.38   5.36

low value in %util column for this machine.

The server started properly this time and the :2379 endpoint could be reached properly.