etcd-server start failure

hotbaby commented 3 years ago

apisix-etcd-0 log

2021-09-15 12:10:14.648882 I | embed: advertise client URLs = http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379
2021-09-15 12:10:14.664499 W | etcdserver: could not get cluster response from http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380: Get http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380/members: dial tcp 10.244.12.25:2380: connect: connection refused
2021-09-15 12:10:14.665989 W | etcdserver: could not get cluster response from http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380: Get http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380/members: dial tcp 10.244.14.120:2380: connect: connection refused
2021-09-15 12:10:14.667893 C | etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs

[root@k8s-master-02 yanfa]# k get pod -ningress-apisix
NAME                                         READY   STATUS             RESTARTS   AGE
apisix-7cf795987f-6rgsh                      0/1     Init:0/1           0          4h23m
apisix-etcd-0                                0/1     CrashLoopBackOff   42         3h32m
apisix-etcd-1                                0/1     CrashLoopBackOff   43         3h32m
apisix-etcd-2                                0/1     CrashLoopBackOff   42         3h32m
apisix-ingress-controller-698c48568f-dngk6   1/1     Running            1          4h23m

[root@k8s-master-02 yanfa]# k get svc  -ningress-apisix
NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
apisix-admin                ClusterIP   10.108.207.180   <none>        9180/TCP            4h24m
apisix-etcd                 ClusterIP   10.102.140.38    <none>        2379/TCP,2380/TCP   4h24m
apisix-etcd-headless        ClusterIP   None             <none>        2379/TCP,2380/TCP   4h24m
apisix-gateway              NodePort    10.106.254.50    <none>        80:32690/TCP        4h24m
apisix-ingress-controller   ClusterIP   10.105.36.179    <none>        80/TCP              4h24m

tokers commented 3 years ago

The ETCD logs reported the communication between these instances is aborted, be sure the Kubernetes pod networking is healthy (check the CNI running status).

etcdWu commented 2 years ago

I am experiencing the type error which says the health check failed, and the error as follows: Readiness probe failed: {"level":"warn","ts":"2022-08-09T07:35:56.993Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001d0380/apisix-test-etcd-0.apisix-test-etcd-headless.deploy-test01.svc.monix.nonprod:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.240.91.47:2379: connect: connection refused\""} apisix-test-etcd-0.apisix-test-etcd-headless.deploy-test01.svc.monix.nonprod:2379 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster [38;5;6metcd [38;5;5m07:35:56.99 [0m[38;5;1mERROR[0m ==> Unhealthy endpoint!

What should I do ? Pls

tokers commented 2 years ago

The ETCD logs reported the communication between these instances is aborted, be sure the Kubernetes pod networking is healthy (check the CNI running status).

See this. Check the networking of your Kubernetes cluster.

jishaashokan commented 1 year ago

The etcd crashes frequently.

mk logs -f apisix-etcd-1 etcd 04:28:36.13 etcd 04:28:36.13 Welcome to the Bitnami etcd container etcd 04:28:36.14 Subscribe to project updates by watching https://github.com/bitnami/containers etcd 04:28:36.14 Submit issues and feature requests at https://github.com/bitnami/containers/issues etcd 04:28:36.14 etcd 04:28:36.14 INFO ==> ** Starting etcd setup ** etcd 04:28:36.16 INFO ==> Validating settings in ETCD_* env vars.. etcd 04:28:36.16 WARN ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment. etcd 04:28:36.16 INFO ==> Initializing etcd etcd 04:28:36.16 INFO ==> Generating etcd config file using env variables etcd 04:28:36.18 INFO ==> Detected data from previous deployments etcd 04:28:37.69 INFO ==> Updating member in existing cluster {"level":"warn","ts":"2023-01-05T04:28:37.852Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003b4000/apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: member not found"} Error: etcdserver: member not found

mk exec -it apisix-etcd-1 -- /bin/bash error: unable to upgrade connection: container not found ("etcd")

shadowoftheknight commented 1 year ago

Is there any resolution to this yet @jishaashokan , what workaround did you try.

Naist4869 commented 3 weeks ago

etcd 频繁崩溃。

mk logs -f apisix-etcd-1 etcd 04:28:36.13 etcd 04:28:36.13 Welcome to the Bitnami etcd container etcd 04:28:36.14 Subscribe to project updates by watching https://github.com/bitnami/containers etcd 04:28:36.14 Submit issues and feature requests at https://github.com/bitnami/containers/issues etcd 04:28:36.14 etcd 04:28:36.14 INFO ==> ** Starting etcd setup ** etcd 04:28:36.16 INFO ==> Validating settings in ETCD_* env vars.. etcd 04:28:36.16 WARN ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment. etcd 04:28:36.16 INFO ==> Initializing etcd etcd 04:28:36.16 INFO ==> Generating etcd config file using env variables etcd 04:28:36.18 INFO ==> Detected data from previous deployments etcd 04:28:37.69 INFO ==> Updating member in existing cluster {"level":"warn","ts":"2023-01-05T04:28:37.852Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003b4000/apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: member not found"} Error: etcdserver: member not found

mk exec -it apisix-etcd-1 -- /bin/bash error: unable to upgrade connection: container not found ("etcd")

+1

apache / apisix-helm-chart

etcd-server start failure #144