coreos / etcd-operator

etcd operator creates/configures/manages etcd clusters atop Kubernetes
https://coreos.com/blog/introducing-the-etcd-operator.html
Apache License 2.0
1.75k stars 741 forks source link

failure to start member in existing cluster, readiness probe fails #2164

Closed eliaoggian closed 4 years ago

eliaoggian commented 4 years ago

Issue

I deleted one node and as a consequence an etcd-cluster pod that was running on it. The pod is now recreating on another node but the readinessProbe fails, therefore the pod gets killed after reaching the timeout, and created again and again and again.

The member is being added to the cluster and removed correctly though.

The cluster is unable to reach the desired state of 5 ready nodes.

Versions

Values.yaml used to install the chart

etcdCluster:
  name: etcd-cluster
  size: 5
  version: 3.4.3
  repository: "private-registry.example.com:5000/coreos/etcd"
  pod:
    busyboxImage: "private-registry.example.com:5000/busybox:1.28.0-glibc"
customResources:
  createEtcdClusterCRD: true
etcdOperator:
  image:
    repository: "private-registry.example.com:5000/coreos/etcd-operator"
backupOperator:
  image:
    repository: "private-registry.example.com:5000/coreos/etcd-operator"
restoreOperator:
  image:
    repository: "private-registry.example.com:5000/coreos/etcd-operator"

Logs

2020-02-28 12:53:09.522133 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_SERVICE_HOST=10.43.54.38
2020-02-28 12:53:09.522193 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_SERVICE_PORT=19999
2020-02-28 12:53:09.522198 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_SERVICE_PORT_HTTP_ETCD_RESTORE_PORT=19999
2020-02-28 12:53:09.522201 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_PORT_2379_TCP_PROTO=tcp
2020-02-28 12:53:09.522205 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_PORT=tcp://10.43.54.38:19999
2020-02-28 12:53:09.522207 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_PORT_19999_TCP_PORT=19999
2020-02-28 12:53:09.522210 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_PORT_2379_TCP=tcp://10.43.111.132:2379
2020-02-28 12:53:09.522213 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_PORT_2379_TCP_ADDR=10.43.111.132
2020-02-28 12:53:09.522221 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_PORT_19999_TCP_PROTO=tcp
2020-02-28 12:53:09.522224 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_PORT_19999_TCP_ADDR=10.43.54.38
2020-02-28 12:53:09.522227 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_SERVICE_HOST=10.43.111.132
2020-02-28 12:53:09.522230 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_SERVICE_PORT_CLIENT=2379
2020-02-28 12:53:09.522232 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_PORT_2379_TCP_PORT=2379
2020-02-28 12:53:09.522235 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_PORT=tcp://10.43.111.132:2379
2020-02-28 12:53:09.522239 W | pkg/flags: unrecognized environment variable ETCD_RESTORE_OPERATOR_PORT_19999_TCP=tcp://10.43.54.38:19999
2020-02-28 12:53:09.522243 W | pkg/flags: unrecognized environment variable ETCD_CLUSTER_CLIENT_SERVICE_PORT=2379
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2020-02-28 12:53:09.522285 I | etcdmain: etcd Version: 3.4.3
2020-02-28 12:53:09.522288 I | etcdmain: Git SHA: 3cf2f69b5
2020-02-28 12:53:09.522291 I | etcdmain: Go Version: go1.12.12
2020-02-28 12:53:09.522293 I | etcdmain: Go OS/Arch: linux/amd64
2020-02-28 12:53:09.522296 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2020-02-28 12:53:09.522707 I | embed: name = etcd-cluster-fc8t7n8wgs
2020-02-28 12:53:09.522726 I | embed: data dir = /var/etcd/data
2020-02-28 12:53:09.522732 I | embed: member dir = /var/etcd/data/member
2020-02-28 12:53:09.522736 I | embed: heartbeat = 100ms
2020-02-28 12:53:09.522740 I | embed: election = 1000ms
2020-02-28 12:53:09.522744 I | embed: snapshot count = 100000
2020-02-28 12:53:09.522759 I | embed: advertise client URLs = http://etcd-cluster-fc8t7n8wgs.etcd-cluster.etcd-operator.svc:2379
{"level":"info","ts":1582894389.532038,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-6b7k9l9m6j.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-6b7k9l9m6j.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.30.4:2380"}
{"level":"info","ts":1582894389.5332756,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-6b7k9l9m6j.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-6b7k9l9m6j.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.30.4:2380"}
{"level":"info","ts":1582894389.5355897,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-cldp4nj5hs.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-cldp4nj5hs.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.30.5:2380"}
{"level":"info","ts":1582894389.6218789,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-cldp4nj5hs.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-cldp4nj5hs.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.30.5:2380"}
{"level":"info","ts":1582894389.6242368,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-fc8t7n8wgs.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-fc8t7n8wgs.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.8.253:2380"}
{"level":"info","ts":1582894389.626507,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-fc8t7n8wgs.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-fc8t7n8wgs.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.8.253:2380"}
{"level":"info","ts":1582894389.6290252,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-r68gw95ndw.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-r68gw95ndw.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.33.3:2380"}
{"level":"info","ts":1582894389.6312308,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-r68gw95ndw.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-r68gw95ndw.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.33.3:2380"}
{"level":"info","ts":1582894389.6339386,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-rrd8g7s8jv.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-rrd8g7s8jv.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.33.42:2380"}
{"level":"info","ts":1582894389.6360765,"caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"http://etcd-cluster-rrd8g7s8jv.etcd-cluster.etcd-operator.svc:2380","host":"etcd-cluster-rrd8g7s8jv.etcd-cluster.etcd-operator.svc:2380","resolved-addr":"10.42.33.42:2380"}
2020-02-28 12:53:09.752389 I | etcdserver: starting member dba192e4056985b8 in cluster 4b80bf1db9287b2a
raft2020/02/28 12:53:09 INFO: dba192e4056985b8 switched to configuration voters=()
raft2020/02/28 12:53:09 INFO: dba192e4056985b8 became follower at term 0
raft2020/02/28 12:53:09 INFO: newRaft dba192e4056985b8 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2020-02-28 12:53:09.759153 W | auth: simple token is not cryptographically signed
2020-02-28 12:53:09.763671 I | rafthttp: started HTTP pipelining with peer 13bcea49d338936b
2020-02-28 12:53:09.763697 I | rafthttp: started HTTP pipelining with peer 2e914f7df77cc5b7
2020-02-28 12:53:09.763715 I | rafthttp: started HTTP pipelining with peer 3455e4fecaf8cb1b
2020-02-28 12:53:09.763730 I | rafthttp: started HTTP pipelining with peer 6f4f64bf8b1a4711
2020-02-28 12:53:09.763747 I | rafthttp: starting peer 13bcea49d338936b...
2020-02-28 12:53:09.763760 I | rafthttp: started HTTP pipelining with peer 13bcea49d338936b
2020-02-28 12:53:09.822227 I | rafthttp: started streaming with peer 13bcea49d338936b (writer)
2020-02-28 12:53:09.822306 I | rafthttp: started streaming with peer 13bcea49d338936b (writer)
2020-02-28 12:53:09.822603 I | rafthttp: started peer 13bcea49d338936b
2020-02-28 12:53:09.822635 I | rafthttp: added peer 13bcea49d338936b
2020-02-28 12:53:09.822648 I | rafthttp: started streaming with peer 13bcea49d338936b (stream MsgApp v2 reader)
2020-02-28 12:53:09.822657 I | rafthttp: starting peer 2e914f7df77cc5b7...
2020-02-28 12:53:09.822679 I | rafthttp: started HTTP pipelining with peer 2e914f7df77cc5b7
2020-02-28 12:53:09.823127 I | rafthttp: started streaming with peer 13bcea49d338936b (stream Message reader)
2020-02-28 12:53:09.823592 I | rafthttp: started streaming with peer 2e914f7df77cc5b7 (writer)
2020-02-28 12:53:09.920030 I | rafthttp: started streaming with peer 2e914f7df77cc5b7 (writer)
2020-02-28 12:53:09.923721 I | rafthttp: started peer 2e914f7df77cc5b7
2020-02-28 12:53:09.923749 I | rafthttp: added peer 2e914f7df77cc5b7
2020-02-28 12:53:09.923770 I | rafthttp: starting peer 3455e4fecaf8cb1b...
2020-02-28 12:53:09.923795 I | rafthttp: started HTTP pipelining with peer 3455e4fecaf8cb1b
2020-02-28 12:53:09.924057 I | rafthttp: started streaming with peer 2e914f7df77cc5b7 (stream MsgApp v2 reader)
2020-02-28 12:53:09.924346 I | rafthttp: started streaming with peer 2e914f7df77cc5b7 (stream Message reader)
2020-02-28 12:53:09.924863 I | rafthttp: started streaming with peer 3455e4fecaf8cb1b (writer)
2020-02-28 12:53:09.925190 I | rafthttp: started streaming with peer 3455e4fecaf8cb1b (writer)
2020-02-28 12:53:10.021366 I | rafthttp: peer 13bcea49d338936b became active
2020-02-28 12:53:10.021404 I | rafthttp: established a TCP streaming connection with peer 13bcea49d338936b (stream Message reader)
2020-02-28 12:53:10.021706 I | rafthttp: established a TCP streaming connection with peer 13bcea49d338936b (stream MsgApp v2 reader)
2020-02-28 12:53:10.022506 I | rafthttp: started peer 3455e4fecaf8cb1b
2020-02-28 12:53:10.022549 I | rafthttp: added peer 3455e4fecaf8cb1b
2020-02-28 12:53:10.022573 I | rafthttp: starting peer 6f4f64bf8b1a4711...
2020-02-28 12:53:10.022639 I | rafthttp: started HTTP pipelining with peer 6f4f64bf8b1a4711
2020-02-28 12:53:10.024606 I | rafthttp: started streaming with peer 3455e4fecaf8cb1b (stream MsgApp v2 reader)
2020-02-28 12:53:10.028358 I | rafthttp: started peer 6f4f64bf8b1a4711
2020-02-28 12:53:10.028393 I | rafthttp: added peer 6f4f64bf8b1a4711
2020-02-28 12:53:10.028425 I | etcdserver: starting server... [version: 3.4.3, cluster version: to_be_decided]
2020-02-28 12:53:10.123447 I | rafthttp: started streaming with peer 3455e4fecaf8cb1b (stream Message reader)
2020-02-28 12:53:10.123886 I | rafthttp: peer 2e914f7df77cc5b7 became active
2020-02-28 12:53:10.123905 I | rafthttp: established a TCP streaming connection with peer 2e914f7df77cc5b7 (stream Message reader)
2020-02-28 12:53:10.123948 I | rafthttp: established a TCP streaming connection with peer 2e914f7df77cc5b7 (stream MsgApp v2 reader)
2020-02-28 12:53:10.219999 I | rafthttp: started streaming with peer 6f4f64bf8b1a4711 (writer)
2020-02-28 12:53:10.220041 I | rafthttp: started streaming with peer 6f4f64bf8b1a4711 (writer)
2020-02-28 12:53:10.220080 I | rafthttp: started streaming with peer 6f4f64bf8b1a4711 (stream MsgApp v2 reader)
2020-02-28 12:53:10.220418 I | rafthttp: started streaming with peer 6f4f64bf8b1a4711 (stream Message reader)
2020-02-28 12:53:10.222128 I | embed: listening for peers on [::]:2380
raft2020/02/28 12:53:10 INFO: dba192e4056985b8 [term: 0] received a MsgHeartbeat message with higher term from 6f4f64bf8b1a4711 [term: 92]
raft2020/02/28 12:53:10 INFO: dba192e4056985b8 became follower at term 92
raft2020/02/28 12:53:10 INFO: raft.node: dba192e4056985b8 elected leader 6f4f64bf8b1a4711 at term 92
2020-02-28 12:53:10.222778 I | rafthttp: established a TCP streaming connection with peer 2e914f7df77cc5b7 (stream Message writer)
2020-02-28 12:53:10.223604 I | rafthttp: established a TCP streaming connection with peer 2e914f7df77cc5b7 (stream MsgApp v2 writer)
2020-02-28 12:53:10.224398 I | rafthttp: established a TCP streaming connection with peer 13bcea49d338936b (stream Message writer)
2020-02-28 12:53:10.224689 I | rafthttp: peer 6f4f64bf8b1a4711 became active
2020-02-28 12:53:10.224710 I | rafthttp: established a TCP streaming connection with peer 6f4f64bf8b1a4711 (stream MsgApp v2 writer)
2020-02-28 12:53:10.224833 I | rafthttp: established a TCP streaming connection with peer 6f4f64bf8b1a4711 (stream Message writer)
2020-02-28 12:53:10.320335 I | rafthttp: peer 3455e4fecaf8cb1b became active
2020-02-28 12:53:10.320364 I | rafthttp: established a TCP streaming connection with peer 3455e4fecaf8cb1b (stream Message reader)
2020-02-28 12:53:10.320389 I | rafthttp: established a TCP streaming connection with peer 13bcea49d338936b (stream MsgApp v2 writer)
2020-02-28 12:53:10.320969 I | rafthttp: established a TCP streaming connection with peer 3455e4fecaf8cb1b (stream MsgApp v2 reader)
2020-02-28 12:53:10.321020 I | rafthttp: established a TCP streaming connection with peer 3455e4fecaf8cb1b (stream MsgApp v2 writer)
2020-02-28 12:53:10.321135 I | rafthttp: established a TCP streaming connection with peer 3455e4fecaf8cb1b (stream Message writer)
2020-02-28 12:53:10.321716 I | etcdserver: dba192e4056985b8 initialized peer connection; fast-forwarding 8 ticks (election ticks 10) with 4 active peer(s)
2020-02-28 12:53:10.323605 I | rafthttp: established a TCP streaming connection with peer 6f4f64bf8b1a4711 (stream MsgApp v2 reader)
2020-02-28 12:53:10.324427 I | rafthttp: established a TCP streaming connection with peer 6f4f64bf8b1a4711 (stream Message reader)
2020-02-28 12:53:10.464690 I | rafthttp: receiving database snapshot [index:10952459, from 6f4f64bf8b1a4711] ...
2020-02-28 12:53:17.220772 E | etcdserver: publish error: etcdserver: request timed out, possibly due to connection lost
2020-02-28 12:53:24.220909 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:53:31.221049 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:53:38.221202 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:53:45.221324 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:53:52.221469 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:53:59.221615 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:54:06.222368 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:54:13.222503 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:54:20.222655 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:54:26.920723 I | snap: saved database snapshot to disk [total bytes: 1512157184]
2020-02-28 12:54:26.923005 I | rafthttp: received and saved database snapshot [index: 10952459, from: 6f4f64bf8b1a4711] successfully
raft2020/02/28 12:54:26 INFO: log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 10952459, term: 92]
raft2020/02/28 12:54:26 INFO: dba192e4056985b8 switched to configuration voters=(1422269185139446635 3355550599809385911 3771172045970787099 8020740235205429009 15826092073597633976)
raft2020/02/28 12:54:26 INFO: dba192e4056985b8 [commit: 10952459, lastindex: 10952459, lastterm: 92] restored snapshot [index: 10952459, term: 92]
raft2020/02/28 12:54:26 INFO: dba192e4056985b8 [commit: 10952459] restored snapshot [index: 10952459, term: 92]
2020-02-28 12:54:26.924094 I | etcdserver: applying snapshot at index 0...
2020-02-28 12:54:26.929445 I | etcdserver: raft applied incoming snapshot at index 10952459
2020-02-28 12:54:27.222774 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:54:39.019900 W | etcdserver: another etcd process is using "/var/etcd/data/member/snap/db" and holds the file lock, or loading backend file is taking >10 seconds
2020-02-28 12:54:39.019928 W | etcdserver: waiting for it to exit before starting...
2020-02-28 12:54:39.219950 W | rafthttp: closed an existing TCP streaming connection with peer 2e914f7df77cc5b7 (stream Message writer)
2020-02-28 12:54:39.219979 I | rafthttp: established a TCP streaming connection with peer 2e914f7df77cc5b7 (stream Message writer)
2020-02-28 12:54:39.222944 W | rafthttp: closed an existing TCP streaming connection with peer 6f4f64bf8b1a4711 (stream Message writer)
2020-02-28 12:54:39.222976 I | rafthttp: established a TCP streaming connection with peer 6f4f64bf8b1a4711 (stream Message writer)
2020-02-28 12:54:39.320667 W | rafthttp: closed an existing TCP streaming connection with peer 6f4f64bf8b1a4711 (stream MsgApp v2 writer)
2020-02-28 12:54:39.320689 I | rafthttp: established a TCP streaming connection with peer 6f4f64bf8b1a4711 (stream MsgApp v2 writer)
2020-02-28 12:54:39.321476 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:54:39.325532 W | rafthttp: closed an existing TCP streaming connection with peer 3455e4fecaf8cb1b (stream MsgApp v2 writer)
2020-02-28 12:54:39.325553 I | rafthttp: established a TCP streaming connection with peer 3455e4fecaf8cb1b (stream MsgApp v2 writer)
2020-02-28 12:54:39.421741 W | rafthttp: closed an existing TCP streaming connection with peer 3455e4fecaf8cb1b (stream Message writer)
2020-02-28 12:54:39.421757 I | rafthttp: established a TCP streaming connection with peer 3455e4fecaf8cb1b (stream Message writer)
2020-02-28 12:54:39.822186 W | rafthttp: closed an existing TCP streaming connection with peer 13bcea49d338936b (stream MsgApp v2 writer)
2020-02-28 12:54:39.822246 I | rafthttp: established a TCP streaming connection with peer 13bcea49d338936b (stream MsgApp v2 writer)
2020-02-28 12:54:39.822450 W | rafthttp: closed an existing TCP streaming connection with peer 13bcea49d338936b (stream Message writer)
2020-02-28 12:54:39.822488 I | rafthttp: established a TCP streaming connection with peer 13bcea49d338936b (stream Message writer)
2020-02-28 12:54:40.222601 W | rafthttp: closed an existing TCP streaming connection with peer 2e914f7df77cc5b7 (stream MsgApp v2 writer)
2020-02-28 12:54:40.222637 I | rafthttp: established a TCP streaming connection with peer 2e914f7df77cc5b7 (stream MsgApp v2 writer)
2020-02-28 12:54:46.321586 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:54:53.321780 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:00.321944 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:07.322108 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:14.322266 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:21.322422 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:28.322532 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:35.322699 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:42.322846 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:49.323036 E | etcdserver: publish error: etcdserver: request timed out
2020-02-28 12:55:56.323231 E | etcdserver: publish error: etcdserver: request timed out
$kubectl get pods -n etcd-operator -l app=etcd
NAME                      READY   STATUS    RESTARTS   AGE
etcd-cluster-6b7k9l9m6j   1/1     Running   0          24d
etcd-cluster-8gl7lcgb76   0/1     Running   0          53s
etcd-cluster-cldp4nj5hs   1/1     Running   0          24d
etcd-cluster-r68gw95ndw   1/1     Running   0          24d
etcd-cluster-rrd8g7s8jv   1/1     Running   0          15d
$kubectl describe pod -n etcd-operator etcd-cluster-8gl7lcgb76
Events:
  Type     Reason     Age        From                          Message
  ----     ------     ----       ----                          -------
  Normal   Created    65s        kubelet, rancher-dev-worker1  Created container etcd
  Normal   Started    65s        kubelet, rancher-dev-worker1  Started container etcd
  Warning  Unhealthy  59s        kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T12:59:35.720Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  53s  kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T12:59:41.023Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  48s  kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T12:59:46.322Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  45s  kubelet, rancher-dev-worker1  Liveness probe failed: {"level":"warn","ts":"2020-02-28T12:59:49.429Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  43s  kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T12:59:51.521Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  38s  kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T12:59:56.824Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  32s  kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T13:00:02.027Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  27s  kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T13:00:07.426Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  22s  kubelet, rancher-dev-worker1  Readiness probe failed: {"level":"warn","ts":"2020-02-28T13:00:12.528Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
  Warning  Unhealthy  1s (x4 over 17s)  kubelet, rancher-dev-worker1  (combined from similar events): Readiness probe failed: {"level":"warn","ts":"2020-02-28T13:00:33.223Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)

I thought that this could be related to #2109

Any help is appreciated.

Thanks

eliaoggian commented 4 years ago

I found out the problem was the huge size of the DB due to never compacting and defragging it. This helped resolve the issue: https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#space-quota