Closed OneCodeMonkey closed 3 years ago
Please provide more info if you are still having this issue.
version 3.4 execute "etcdctl endpoint health" "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "msg": "retrying of unary invoker failed",
I think I am hitting a similar issue:
Kubernetes version: 1.18.8
Etcd cluster with 3 members
etcdctl version: 3.4.13 API version: 3.4
Etcd logs show the following errors:
{"level":"debug","ts":"2020-11-05T20:09:11.191Z","caller":"v3rpc/watch.go:193","msg":"failed to receive watch request from gRPC stream","error":"rpc error: code = Canceled desc = context canceled"} {"level":"debug","ts":"2020-11-05T20:17:12.193Z","caller":"v3rpc/watch.go:193","msg":"failed to receive watch request from gRPC stream","error":"rpc error: code = Canceled desc = body closed by handler"}
etcdctl reports all endpoints are healthy:
etcdctl --endpoints=https://
https://
However, when I run etcdctl endpoint health on each cluster member I get:
etcdctl endpoint health {"level":"warn","ts":"2020-11-05T20:45:06.862Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-c8b0da71-f0fe-49c9-8ac5-e4d33077926c/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection closed"} 127.0.0.1:2379 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster command terminated with exit code 1
Running etcdctl alarm list on each member also shows the same:
etcdctl alarm list {"level":"warn","ts":"2020-11-05T20:47:50.885Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-ef24a712-6b02-47ac-8b81-dc6136142d2e/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection closed"} Error: context deadline exceeded command terminated with exit code 1
Facing the same issue.
I am trying to setup ETCD cluster on 2 nodes.
The services on both the nodes are properly running, but etcdctl
fails with Error: context deadline exceeded
Checking endpoint health also reports the same error.
vagrant@master-1:~$ etcdctl endpoint health
{"level":"warn","ts":"2020-11-26T21:49:03.204Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-08349238-638e-434d-aff9-37777482a226/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
127.0.0.1:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
This is my systemd unit file for etcd.service
[Unit]
Description=etcd
Documentation=https://github.com/etcd-io/etcd
[Service]
ExecStart=/usr/local/bin/etcd \
--name master-1 \
--cert-file=/etc/etcd/etcd-server.crt \
--key-file=/etc/etcd/etcd-server.key \
--peer-cert-file=/etc/etcd/etcd-server.crt \
--peer-key-file=/etc/etcd/etcd-server.key \
--trusted-ca-file=/etc/etcd/ca.crt \
--peer-trusted-ca-file=/etc/etcd/ca.crt \
--peer-client-cert-auth \
--client-cert-auth \
--initial-advertise-peer-urls https://192.168.5.11:2380 \
--listen-peer-urls https://192.168.5.11:2380 \
--listen-client-urls https://192.168.5.11:2379,https://127.0.0.1:2379 \
--advertise-client-urls https://192.168.5.11:2379 \
--initial-cluster-token etcd-cluster-0 \
--initial-cluster master-1=https://192.168.5.11:2380,master-2=https://192.168.5.12:2380 \
--initial-cluster-state new \
--data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
I think I am hitting a similar issue:
Kubernetes version: 1.18.8
Etcd cluster with 3 members
etcdctl version: 3.4.13 API version: 3.4
Etcd logs show the following errors:
{"level":"warn","ts":"2020-12-10T13:24:39.343+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-9bccea73-19e9-47b2-b0b5-a2080c2ee773/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
@pnoker I fixed this issue in the past by upping the CPU on my worker nodes.
I am getting the same issue
Message: unable to persist tenant-agent cluster config: unable to create secret "oke-tkm-oke-csgkyztgq4d-ta-cluster": Internal error occurred: rpc error: code = DeadlineExceeded desc = context deadline exceeded
systemctl stop firewalld; systemctl disable firewalld helpped me :D
systemctl stop firewalld; systemctl disable firewalld helpped me :D
+1
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
may be need crt and key?
the DeadlineExceeded
in fact might means timeout ...
in[1]:
etcdctl --endpoints 10.20.3.4:2379 member list
out[1]:
{"level":"warn","ts":"2021-09-26T15:29:09.847+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00045aa80/#initially=[10.20.3.4:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded
in[2]:
etcdctl --endpoints 10.20.3.4:2379 endpoint health
out[2]:
{"level":"warn","ts":1632643435.8314652,"logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000166a80/#initially=[10.20.3.4:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
10.20.3.4:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
just add some flags ...
in[3]:
# at a k8s master node:
cd /etc/kubernetes/pki/etcd &&
etcdctl --endpoints 10.20.3.4:2379 --cacert=ca.crt --cert=server.crt --key=server.key endpoint health
out[3]:
10.20.3.4:2379 is healthy: successfully committed proposal: took = 11.521376ms
successful .
use client crt & key
in[4]:
# at a k8s master node:
cd /etc/kubernetes/pki &&
etcdctl --endpoints 10.20.3.4:2379 --cacert=etcd/ca.crt --cert=apiserver-etcd-client.crt --key=apiserver-etcd-client.key endpoint health
out[4]:
10.20.3.4:2379 is healthy: successfully committed proposal: took = 12.203806ms
Is there any solution to this problem
@hmrg-grmh you are right, thank you~, in my case:
Check etcd service file:
$ systemctl cat etcd
## output
# /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target
[Service]
Type=notify
User=root
EnvironmentFile=/etc/etcd.env
ExecStart=/usr/local/bin/etcd
NotifyAccess=all
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
Get etcd command line environment:
$ cat /etc/etcd.env | grep ETCDCTL
## output
ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-etcd-01-key.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-etcd-01.pem
Run etcdctl
commands:
# On etcd host
$ etcdctl --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/admin-etcd-01.pem --key=/etc/ssl/etcd/ssl/admin-etcd-01-key.pem endpoint health
## output
127.0.0.1:2379 is healthy: successfully committed proposal: took = 9.243631ms
this is a bit wired as even after you source the /etc/etcd.env
file and the ETCDCTL_CACERT, ETCDCTL_KEY and ETCDCTL_CERT are made available as environment variable for the current user, etcdctl is still not able to read them.
If I manually specify these variables (again as environment vars) in front of etcdctl it will then work
root@k8sm2:~# etcdctl version
etcdctl version: 3.5.3
API version: 3.5
root@k8sm2:~# echo -e "$ETCDCTL_KEY \n $ETCDCTL_CACERT \n $ETCDCTL_CERT"
root@k8sm2:~# source /etc/etcd.env
root@k8sm2:~# echo -e "$ETCDCTL_KEY\n$ETCDCTL_CACERT\n$ETCDCTL_CERT"
/etc/ssl/etcd/ssl/admin-k8sm2.cloudalbania.com-key.pem
/etc/ssl/etcd/ssl/ca.pem
/etc/ssl/etcd/ssl/admin-k8sm2.cloudalbania.com.pem
# variables are declared now, it will still not work
root@k8sm2:~# etcdctl member list
{"level":"warn","ts":"2022-07-13T05:26:22.028Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003d2a80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded
root@k8sm2:~#
# When manually specifying them, it will now work:
root@k8sm2:~# ETCDCTL_KEY=$ETCDCTL_KEY ETCDCTL_CACERT=$ETCDCTL_CACERT ETCDCTL_CERT=$ETCDCTL_CERT etcdctl member list
310169cfcd6ada7, started, etcd3, https://192.168.88.83:2380, https://192.168.88.83:2379, false
1823d38b4632fc3c, started, etcd2, https://192.168.88.82:2380, https://192.168.88.82:2379, false
f3ec59bcde14e760, started, etcd1, https://192.168.88.81:2380, https://192.168.88.81:2379, false
root@k8sm2:~#
Not sure if this is expected behavior, but looks not good.
Some times this may be caused by proxy environment, please unset the related environments, then retry.
I got the same error and don't why untill now ,but when I restart another node , it works...
Got error for every operation done on the system, delete, cleanup, ListContainers, ExecSync, Container runtime sanity check
err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Performed apt update && apt upgrade -y rebooted the node, it worked!!
https://github.com/etcd-io/etcd/issues/12234#issuecomment-753382725
@pnoker I fixed this issue in the past by upping the CPU on my worker nodes.
Im fine,. I same fixed it
starting etcd cluster with 3 nodes failed. The first two node connected, but the third reports: