cilium / cilium-etcd-operator

Operator to manage Cilium's etcd cluster
Apache License 2.0
26 stars 15 forks source link

New installation fails to connect peers #37

Closed kekoav closed 5 years ago

kekoav commented 5 years ago

Hello, I'm getting the following error from the first node, when the second node attempts to join the etcd cluster:

2019-02-05 14:03:36.771423 I | embed: rejected connection from "10.2.88.215:43408" (error "tls: \"10.2.88.215\" does not match any of DNSNames [\"*.cilium-etcd.kube-system.svc\" \"*.cilium-etcd.kube-system.svc.cluster.local\"]", ServerName "cilium-etcd-85xwf7zw6h.cilium-etcd.kube-system.svc", IPAddresses [], DNSNames ["*.cilium-etcd.kube-system.svc" "*.cilium-etcd.kube-system.svc.cluster.local"])

There are many other messages in the logs, but I think this is might be the cause of my cluster not coming up. I haven't configured anything special with the operator, I'm just trying to run version v2.0.5.

Any idea what the problem is, did I miss a setting for the TLS generation?

This is the log from peer 2 when it tries to connect:


2019-02-05 14:14:06.389856 I | etcdmain: etcd Version: 3.3.11
2019-02-05 14:14:06.389903 I | etcdmain: Git SHA: 2cf9e51d2
2019-02-05 14:14:06.389909 I | etcdmain: Go Version: go1.10.7
2019-02-05 14:14:06.389912 I | etcdmain: Go OS/Arch: linux/amd64
2019-02-05 14:14:06.389917 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2019-02-05 14:14:06.389948 I | embed: peerTLS: cert = /etc/etcdtls/member/peer-tls/peer.crt, key = /etc/etcdtls/member/peer-tls/peer.key, ca = , trusted-ca = /etc/etcdtls/member/peer-tls/peer-ca.crt, client-cert-auth = true, crl-file = 
2019-02-05 14:14:06.390652 I | embed: listening for peers on https://0.0.0.0:2380
2019-02-05 14:14:06.390688 I | embed: listening for client requests on 0.0.0.0:2379
2019-02-05 14:14:06.405693 W | etcdserver: could not get cluster response from https://cilium-etcd-fwgf5md5bc.cilium-etcd.kube-system.svc:2380: Get https://cilium-etcd-fwgf5md5bc.cilium-etcd.kube-system.svc:2380/members: EOF
2019-02-05 14:14:06.406319 C | etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given urls```
kekoav commented 5 years ago

I found the answer... my cluster didn't have it's Core DNS updated for the reverse lookup:

https://github.com/cilium/cilium/blob/aa22314462750c67a18a18ba6b5d4a16f3a99c4d/Documentation/gettingstarted/k8s-install-etcd-operator-steps.rst

tgraf commented 5 years ago

@kekoav Ah! We'll add it to the README to make it clear. I'll use this issue to track the addition of that.