coreos / etcd-operator

etcd operator creates/configures/manages etcd clusters atop Kubernetes
https://coreos.com/blog/introducing-the-etcd-operator.html
Apache License 2.0
1.75k stars 741 forks source link

Services can't be used for DNS discovery #1381

Open cbandy opened 7 years ago

cbandy commented 7 years ago

I'm using quay.io/coreos/etcd-operator:v0.5.1 to create the following cluster:

apiVersion: etcd.database.coreos.com/v1beta2
kind: EtcdCluster
metadata:
  name: etcd
spec:
  size: 3
  version: "3.2.7"
  pod:
    antiAffinity: true

I see two services created with ports named client and peer:

$ kubectl describe services
Name:           etcd
Namespace:      default
Labels:         app=etcd
            etcd_cluster=etcd
Annotations:        service.alpha.kubernetes.io/tolerate-unready-endpoints=true
Selector:       app=etcd,etcd_cluster=etcd
Type:           ClusterIP
IP:         None
Port:           client  2379/TCP
Endpoints:      10.8.0.15:2379,10.8.1.9:2379,10.8.2.10:2379
Port:           peer    2380/TCP
Endpoints:      10.8.0.15:2380,10.8.1.9:2380,10.8.2.10:2380
Session Affinity:   None
Events:         <none>

Name:           etcd-client
Namespace:      default
Labels:         app=etcd
            etcd_cluster=etcd
Annotations:        service.alpha.kubernetes.io/tolerate-unready-endpoints=true
Selector:       app=etcd,etcd_cluster=etcd
Type:           ClusterIP
IP:         10.11.253.157
Port:           client  2379/TCP
Endpoints:      10.8.0.15:2379,10.8.1.9:2379,10.8.2.10:2379
Session Affinity:   None
Events:         <none>

I can use the etcd-client service as the endpoint and see that the cluster is functional:

# etcdctl -v
etcdctl version 2.2.5

# ETCDCTL_ENDPOINT="http://etcd-client:2379" etcdctl cluster-health
member 186504f50442094c is healthy: got healthy result from http://etcd-0001.etcd.default.svc:2379
member 6c4d5a576dad55c5 is healthy: got healthy result from http://etcd-0002.etcd.default.svc:2379
member b885aaad8c46a728 is healthy: got healthy result from http://etcd-0000.etcd.default.svc:2379

However, I'm unable to use SRV discovery to connect to those services:

# for n in etcd etcd.default etcd.default.svc.cluster.local etcd-client etcd-client.default etcd-client.default.svc.cluster.local ; do ETCDCTL_DISCOVERY_SRV="$n" etcdctl cluster-health ; done
dns lookup errors: lookup _etcd-server-ssl._tcp.etcd on 10.11.240.10:53: no such host and lookup _etcd-server._tcp.etcd on 10.11.240.10:53: no such host
dns lookup errors: lookup _etcd-server-ssl._tcp.etcd.default on 10.11.240.10:53: no such host and lookup _etcd-server._tcp.etcd.default on 10.11.240.10:53: no such host
dns lookup errors: lookup _etcd-server-ssl._tcp.etcd.default.svc.cluster.local on 10.11.240.10:53: no such host and lookup _etcd-server._tcp.etcd.default.svc.cluster.local on 10.11.240.10:53: no such host
dns lookup errors: lookup _etcd-server-ssl._tcp.etcd-client on 10.11.240.10:53: no such host and lookup _etcd-server._tcp.etcd-client on 10.11.240.10:53: no such host
dns lookup errors: lookup _etcd-server-ssl._tcp.etcd-client.default on 10.11.240.10:53: no such host and lookup _etcd-server._tcp.etcd-client.default on 10.11.240.10:53: no such host
dns lookup errors: lookup _etcd-server-ssl._tcp.etcd-client.default.svc.cluster.local on 10.11.240.10:53: no such host and lookup _etcd-server._tcp.etcd-client.default.svc.cluster.local on 10.11.240.10:53: no such host

Is that intentional?

If I understand correctly, naming the ports etcd-server and etcd-client will generate the correct SRV records.

xiang90 commented 7 years ago

@cbandy I am not sure if k8s will generate the correct srv format as etcdctl would expect. We never try to make it work.

cbandy commented 7 years ago

Here's what the SRV records look like for the services above:

# dig +nocmd +nostats \
       _client._tcp.etcd.default.svc.cluster.local srv \
         _peer._tcp.etcd.default.svc.cluster.local srv \
_client._tcp.etcd-client.default.svc.cluster.local srv
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8163
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3

;; QUESTION SECTION:
;_client._tcp.etcd.default.svc.cluster.local.   IN SRV

;; ANSWER SECTION:
_client._tcp.etcd.default.svc.cluster.local.    30 IN SRV 10 33 2379 etcd-0002.etcd.default.svc.cluster.local.
_client._tcp.etcd.default.svc.cluster.local.    30 IN SRV 10 33 2379 etcd-0000.etcd.default.svc.cluster.local.
_client._tcp.etcd.default.svc.cluster.local.    30 IN SRV 10 33 2379 etcd-0001.etcd.default.svc.cluster.local.

;; ADDITIONAL SECTION:
etcd-0002.etcd.default.svc.cluster.local. 30    IN A 10.8.0.15
etcd-0000.etcd.default.svc.cluster.local. 30    IN A 10.8.1.9
etcd-0001.etcd.default.svc.cluster.local. 30    IN A 10.8.2.10
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1093
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3

;; QUESTION SECTION:
;_peer._tcp.etcd.default.svc.cluster.local. IN SRV

;; ANSWER SECTION:
_peer._tcp.etcd.default.svc.cluster.local. 30 IN SRV    10 33 2380 etcd-0002.etcd.default.svc.cluster.local.
_peer._tcp.etcd.default.svc.cluster.local. 30 IN SRV    10 33 2380 etcd-0000.etcd.default.svc.cluster.local.
_peer._tcp.etcd.default.svc.cluster.local. 30 IN SRV    10 33 2380 etcd-0001.etcd.default.svc.cluster.local.

;; ADDITIONAL SECTION:
etcd-0002.etcd.default.svc.cluster.local. 30    IN A 10.8.0.15
etcd-0000.etcd.default.svc.cluster.local. 30    IN A 10.8.1.9
etcd-0001.etcd.default.svc.cluster.local. 30    IN A 10.8.2.10
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51017
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;_client._tcp.etcd-client.default.svc.cluster.local. IN SRV

;; ANSWER SECTION:
_client._tcp.etcd-client.default.svc.cluster.local. 30 IN SRV 10 100 2379 etcd-client.default.svc.cluster.local.

;; ADDITIONAL SECTION:
etcd-client.default.svc.cluster.local. 30 IN    A 10.11.253.157
xiang90 commented 7 years ago

k8s format: _client._tcp.etcd.default.svc.cluster.local

is different from etcd-srv-discovery format: _etcd-server._tcp.example.com

one has subdomain _client, one expects etcd-server or etcd-client.

cbandy commented 7 years ago

k8s format ... is different from etcd-srv-discovery format

Precisely.

one has subdomain _client

This subdomain comes from the Name attribute of the port in the service resource:

https://github.com/coreos/etcd-operator/blob/cf7d8d5568737992add2fb1bd1f7c7840dd49901/pkg/util/k8sutil/k8sutil.go#L135-L145

hongchaodeng commented 7 years ago

Submit a PR? (my fault)

bmcustodio commented 7 years ago

This is affecting me too. @cbandy will you submit a PR or would you prefer me to do it?

cbandy commented 7 years ago

I'm not setup to test this, so I cannot submit anything for some time.

bmcustodio commented 7 years ago

I understand. That being, I'll take upon it, if you don't mind.

On Qua, 13 de set de 2017, 13:49 Chris Bandy notifications@github.com wrote:

I'm not setup to test this, so I cannot submit anything for some time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/coreos/etcd-operator/issues/1381#issuecomment-329156853, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQW-D6qnEC_IVRYFG4jcdQgBYPWEmgxks5sh88-gaJpZM4PQAGK .

cbandy commented 7 years ago

Thanks @brunomcustodio!

hongchaodeng commented 7 years ago

@cbandy Can you tell us your use case first? Specifically, why do you want to SRV discovery instead of k8s service?