etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.63k stars 9.75k forks source link

rpc error: code = DeadlineExceeded desc = context deadline exceeded #12234

Closed OneCodeMonkey closed 3 years ago

OneCodeMonkey commented 4 years ago

starting etcd cluster with 3 nodes failed. The first two node connected, but the third reports:

rpc error: code = DeadlineExceeded desc = context deadline exceeded

spzala commented 4 years ago

Please provide more info if you are still having this issue.

smartvolshell commented 4 years ago

version 3.4 execute "etcdctl endpoint health" "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "msg": "retrying of unary invoker failed",

tsvetko-droid commented 3 years ago

I think I am hitting a similar issue:

Kubernetes version: 1.18.8

Etcd cluster with 3 members

etcdctl version: 3.4.13 API version: 3.4

Etcd logs show the following errors:

{"level":"debug","ts":"2020-11-05T20:09:11.191Z","caller":"v3rpc/watch.go:193","msg":"failed to receive watch request from gRPC stream","error":"rpc error: code = Canceled desc = context canceled"} {"level":"debug","ts":"2020-11-05T20:17:12.193Z","caller":"v3rpc/watch.go:193","msg":"failed to receive watch request from gRPC stream","error":"rpc error: code = Canceled desc = body closed by handler"}

etcdctl reports all endpoints are healthy:

etcdctl --endpoints=https://:2379,https://:2379,https://:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint health

https://:2379 is healthy: successfully committed proposal: took = 22.183879ms https://:2379 is healthy: successfully committed proposal: took = 22.319905ms https://:2379 is healthy: successfully committed proposal: took = 24.610958ms

However, when I run etcdctl endpoint health on each cluster member I get:

etcdctl endpoint health {"level":"warn","ts":"2020-11-05T20:45:06.862Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-c8b0da71-f0fe-49c9-8ac5-e4d33077926c/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection closed"} 127.0.0.1:2379 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster command terminated with exit code 1

Running etcdctl alarm list on each member also shows the same:

etcdctl alarm list {"level":"warn","ts":"2020-11-05T20:47:50.885Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-ef24a712-6b02-47ac-8b81-dc6136142d2e/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection closed"} Error: context deadline exceeded command terminated with exit code 1

sumit-anantwar commented 3 years ago

Facing the same issue. I am trying to setup ETCD cluster on 2 nodes. The services on both the nodes are properly running, but etcdctl fails with Error: context deadline exceeded Checking endpoint health also reports the same error.

vagrant@master-1:~$ etcdctl endpoint health                                                                           
{"level":"warn","ts":"2020-11-26T21:49:03.204Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-08349238-638e-434d-aff9-37777482a226/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}                                                  
127.0.0.1:2379 is unhealthy: failed to commit proposal: context deadline exceeded                                     
Error: unhealthy cluster                                                                                              

This is my systemd unit file for etcd.service

[Unit]                                                                                              
Description=etcd                                                                                    
Documentation=https://github.com/etcd-io/etcd                                                         

[Service]                                                                                           
ExecStart=/usr/local/bin/etcd \                                                                     
  --name master-1 \                                                                                 
  --cert-file=/etc/etcd/etcd-server.crt \                                                           
  --key-file=/etc/etcd/etcd-server.key \                                                            
  --peer-cert-file=/etc/etcd/etcd-server.crt \                                                      
  --peer-key-file=/etc/etcd/etcd-server.key \                                                       
  --trusted-ca-file=/etc/etcd/ca.crt \                                                              
  --peer-trusted-ca-file=/etc/etcd/ca.crt \                                                         
  --peer-client-cert-auth \                                                                         
  --client-cert-auth \                                                                              
  --initial-advertise-peer-urls https://192.168.5.11:2380 \                                         
  --listen-peer-urls https://192.168.5.11:2380 \                                                    
  --listen-client-urls https://192.168.5.11:2379,https://127.0.0.1:2379 \                           
  --advertise-client-urls https://192.168.5.11:2379 \                                               
  --initial-cluster-token etcd-cluster-0 \                                                          
  --initial-cluster master-1=https://192.168.5.11:2380,master-2=https://192.168.5.12:2380 \         
  --initial-cluster-state new \                                                                     
  --data-dir=/var/lib/etcd                                                                          
Restart=on-failure                                                                                  
RestartSec=5  

[Install]
WantedBy=multi-user.target                                                                                      
pnoker commented 3 years ago

I think I am hitting a similar issue:

Kubernetes version: 1.18.8

Etcd cluster with 3 members

etcdctl version: 3.4.13 API version: 3.4

Etcd logs show the following errors:

{"level":"warn","ts":"2020-12-10T13:24:39.343+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-9bccea73-19e9-47b2-b0b5-a2080c2ee773/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
githubsands commented 3 years ago

@pnoker I fixed this issue in the past by upping the CPU on my worker nodes.

AshishTDBA commented 3 years ago

I am getting the same issue

Message: unable to persist tenant-agent cluster config: unable to create secret "oke-tkm-oke-csgkyztgq4d-ta-cluster": Internal error occurred: rpc error: code = DeadlineExceeded desc = context deadline exceeded

hvulin commented 3 years ago

systemctl stop firewalld; systemctl disable firewalld helpped me :D

FrommyMind commented 3 years ago

systemctl stop firewalld; systemctl disable firewalld helpped me :D

+1

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

hmrg-grmh commented 3 years ago

may be need crt and key?

the DeadlineExceeded in fact might means timeout ...


in[1]:

etcdctl --endpoints 10.20.3.4:2379 member list

out[1]:

{"level":"warn","ts":"2021-09-26T15:29:09.847+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00045aa80/#initially=[10.20.3.4:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded

in[2]:

etcdctl --endpoints 10.20.3.4:2379 endpoint health

out[2]:

{"level":"warn","ts":1632643435.8314652,"logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000166a80/#initially=[10.20.3.4:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
10.20.3.4:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

just add some flags ...

in[3]:

# at a k8s master node:
cd /etc/kubernetes/pki/etcd &&
etcdctl --endpoints 10.20.3.4:2379 --cacert=ca.crt --cert=server.crt --key=server.key endpoint health

out[3]:

10.20.3.4:2379 is healthy: successfully committed proposal: took = 11.521376ms

successful .

use client crt & key

in[4]:

# at a k8s master node:
cd /etc/kubernetes/pki &&
etcdctl --endpoints 10.20.3.4:2379 --cacert=etcd/ca.crt --cert=apiserver-etcd-client.crt --key=apiserver-etcd-client.key endpoint health

out[4]:

10.20.3.4:2379 is healthy: successfully committed proposal: took = 12.203806ms
skyzzk commented 2 years ago

Is there any solution to this problem

mtt0 commented 2 years ago

@hmrg-grmh you are right, thank you~, in my case:

Check etcd service file:

$ systemctl cat etcd
## output
# /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target

[Service]
Type=notify
User=root
EnvironmentFile=/etc/etcd.env
ExecStart=/usr/local/bin/etcd
NotifyAccess=all
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

Get etcd command line environment:

$ cat /etc/etcd.env  | grep ETCDCTL
## output
ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-etcd-01-key.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-etcd-01.pem

Run etcdctl commands:

# On etcd host
$ etcdctl --cacert=/etc/ssl/etcd/ssl/ca.pem  --cert=/etc/ssl/etcd/ssl/admin-etcd-01.pem  --key=/etc/ssl/etcd/ssl/admin-etcd-01-key.pem endpoint health
## output
127.0.0.1:2379 is healthy: successfully committed proposal: took = 9.243631ms
besmirzanaj commented 2 years ago

this is a bit wired as even after you source the /etc/etcd.env file and the ETCDCTL_CACERT, ETCDCTL_KEY and ETCDCTL_CERT are made available as environment variable for the current user, etcdctl is still not able to read them. If I manually specify these variables (again as environment vars) in front of etcdctl it will then work

root@k8sm2:~# etcdctl version
etcdctl version: 3.5.3
API version: 3.5

root@k8sm2:~# echo -e "$ETCDCTL_KEY \n $ETCDCTL_CACERT \n $ETCDCTL_CERT"

root@k8sm2:~# source /etc/etcd.env
root@k8sm2:~# echo -e "$ETCDCTL_KEY\n$ETCDCTL_CACERT\n$ETCDCTL_CERT"
/etc/ssl/etcd/ssl/admin-k8sm2.cloudalbania.com-key.pem
/etc/ssl/etcd/ssl/ca.pem
/etc/ssl/etcd/ssl/admin-k8sm2.cloudalbania.com.pem

 # variables are declared now, it will still not work
root@k8sm2:~# etcdctl member list
{"level":"warn","ts":"2022-07-13T05:26:22.028Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003d2a80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded
root@k8sm2:~#

# When manually specifying them, it will now work:
root@k8sm2:~# ETCDCTL_KEY=$ETCDCTL_KEY ETCDCTL_CACERT=$ETCDCTL_CACERT ETCDCTL_CERT=$ETCDCTL_CERT etcdctl member list
310169cfcd6ada7, started, etcd3, https://192.168.88.83:2380, https://192.168.88.83:2379, false
1823d38b4632fc3c, started, etcd2, https://192.168.88.82:2380, https://192.168.88.82:2379, false
f3ec59bcde14e760, started, etcd1, https://192.168.88.81:2380, https://192.168.88.81:2379, false
root@k8sm2:~#

Not sure if this is expected behavior, but looks not good.

cui3093 commented 1 year ago

Some times this may be caused by proxy environment, please unset the related environments, then retry.

jaclon-m commented 1 year ago

I got the same error and don't why untill now ,but when I restart another node , it works...

tikutest commented 1 year ago

Got error for every operation done on the system, delete, cleanup, ListContainers, ExecSync, Container runtime sanity check

err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"

Performed apt update && apt upgrade -y rebooted the node, it worked!!

anton21m commented 1 year ago

https://github.com/etcd-io/etcd/issues/12234#issuecomment-753382725

@pnoker I fixed this issue in the past by upping the CPU on my worker nodes.

Im fine,. I same fixed it

  1. I increased cpu core 1 -> 2
  2. I disabled firewalld (systemctl disabled firewalld)