etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.83k stars 9.77k forks source link

DNS SRV etcd: error setting up initial cluster: cannot find local etcd member "etcd1" in SRV records #11335

Closed BlackSunday001 closed 5 years ago

BlackSunday001 commented 5 years ago

Please read https://github.com/etcd-io/etcd/blob/master/Documentation/reporting_bugs.md. Hi:

hostname role IP system kernel
node01.k8s.com node01 192.168.1.91 CentOS 7.7 5.1.4-1.el7.elrepo.x86_64
node02.k8s.com node02 192.168.1.92 CentOS 7.7 5.1.4-1.el7.elrepo.x86_64
node03.k8s.com node03 192.168.1.93 CentOS 7.7 5.1.4-1.el7.elrepo.x86_64

I have a private Internel ,

used Bind Configuration SRV , it is not works

cat k8s.com.zone 
$TTL 1D
@   IN SOA  @ k8s.com. (
                    0   ; serial
                    1D  ; refresh
                    1H  ; retry
                    1W  ; expire
                    3H )    ; minimum
    IN  NS  @
    IN  A   127.0.0.1
    IN  AAAA    ::1
etcd1   IN  A   192.168.1.91
etcd2   IN  A   192.168.1.92
etcd3   IN  A   192.168.1.93

_etcd-server._tcp.k8s.com. 1H      IN      SRV     2380    0       100     etcd1
_etcd-server._tcp.k8s.com. 1H      IN      SRV     2380    0       100     etcd2
_etcd-server._tcp.k8s.com. 1H      IN      SRV     2380    0       100     etcd3
_etcd-client._tcp.k8s.com. 1H      IN      SRV     2380    0       100     etcd1
_etcd-client._tcp.k8s.com. 1H      IN      SRV     2380    0       100     etcd2
_etcd-client._tcp.k8s.com. 1H      IN      SRV     2380    0       100     etcd3

DNS test:

dig +noall +answer SRV _etcd-server-ssl._tcp.k8s.com
_etcd-server-ssl._tcp.k8s.com. 3600 IN  SRV 2380 0 100 etcd3.k8s.com.
_etcd-server-ssl._tcp.k8s.com. 3600 IN  SRV 2380 0 100 etcd1.k8s.com.
_etcd-server-ssl._tcp.k8s.com. 3600 IN  SRV 2380 0 100 etcd2.k8s.com.
dig @192.168.1.122 +noall +answer etcd1.k8s.com etcd2.k8s.com etcd3.k8s.com
etcd1.k8s.com.      86400   IN  A   192.168.1.91
etcd2.k8s.com.      86400   IN  A   192.168.1.92
etcd3.k8s.com.      86400   IN  A   192.168.1.93

etcd config:

[root@node01 etcd]# vim etcd.conf
ETCD_DATA_DIR="/data/k8s/etcd/data"
ETCD_WAL_DIR="/data/k8s/etcd/wal"
ETCD_LISTEN_PEER_URLS="http://192.168.1.91:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.91:2379"
ETCD_MAX_SNAPSHOTS="5"
ETCD_MAX_WALS="5"
ETCD_NAME="etcd1"
ETCD_SNAPSHOT_COUNT="100000"
ETCD_HEARTBEAT_INTERVAL="100"
ETCD_ELECTION_TIMEOUT="1000"

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://etcd1.k8s.com:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://etcd1.k8s.com:2379"
ETCD_DISCOVERY_SRV="k8s.com"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"

ETCD_CERT_FILE="/etc/etcd/cert/etcd.pem"
ETCD_KEY_FILE="/etc/etcd/cert/etcd-key.pem"
ETCD_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE="/etc/kubernetes/cert/ca.pem"
ETCD_AUTO_TLS="true"
ETCD_PEER_CERT_FILE="/etc/etcd/cert/etcd.pem"
ETCD_PEER_KEY_FILE="/etc/etcd/cert/etcd-key.pem"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_TRUSTED_CA_FILE="/etc/kubernetes/cert/ca.pem"
ETCD_PEER_AUTO_TLS="true"

is not works

/var/log/messages:

Nov  7 13:26:54 node01 systemd: Starting Etcd Server...
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=https://etcd1.k8s.com:2379,https://etcd1.k8s.com:4001
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_AUTO_TLS=true
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_CERT_FILE=/etc/etcd/cert/etcd.pem
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_CLIENT_CERT_AUTH=true
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_DISCOVERY_SRV=k8s.com
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_ELECTION_TIMEOUT=1000
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_HEARTBEAT_INTERVAL=100
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=https://etcd1.k8s.com:2380
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=new
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_KEY_FILE=/etc/etcd/cert/etcd-key.pem
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_LISTEN_PEER_URLS=https://192.168.1.91:2380
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_MAX_SNAPSHOTS=5
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_MAX_WALS=5
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_PEER_AUTO_TLS=true
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_PEER_CERT_FILE=/etc/etcd/cert/etcd.pem
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_PEER_CLIENT_CERT_AUTH=true
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_PEER_KEY_FILE=/etc/etcd/cert/etcd-key.pem
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_PEER_TRUSTED_CA_FILE=/etc/kubernetes/cert/ca.pem
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_SNAPSHOT_COUNT=100000
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/etc/kubernetes/cert/ca.pem
Nov  7 13:26:54 node01 etcd: recognized and used environment variable ETCD_WAL_DIR=/data/k8s/etcd/wal
Nov  7 13:26:54 node01 etcd: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag
Nov  7 13:26:54 node01 etcd: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag
Nov  7 13:26:54 node01 etcd: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag
Nov  7 13:26:54 node01 etcd: etcd Version: 3.3.11
Nov  7 13:26:54 node01 etcd: Git SHA: 2cf9e51
Nov  7 13:26:54 node01 etcd: Go Version: go1.10.3
Nov  7 13:26:54 node01 etcd: Go OS/Arch: linux/amd64
Nov  7 13:26:54 node01 etcd: setting maximum number of CPUs to 1, total number of available CPUs is 1
Nov  7 13:26:54 node01 etcd: ignoring peer auto TLS since certs given
Nov  7 13:26:54 node01 etcd: peerTLS: cert = /etc/etcd/cert/etcd.pem, key = /etc/etcd/cert/etcd-key.pem, ca = , trusted-ca = /etc/kubernetes/cert/ca.pem, client-cert-auth = true, crl-file = 
Nov  7 13:26:54 node01 etcd: listening for peers on https://192.168.1.91:2380
Nov  7 13:26:54 node01 etcd: ignoring client auto TLS since certs given
Nov  7 13:26:54 node01 etcd: listening for client requests on 192.168.1.91:2379
Nov  7 13:26:54 node01 etcd: got bootstrap from DNS for etcd-server at 0=https://etcd1.k8s.com:100
Nov  7 13:26:54 node01 etcd: got bootstrap from DNS for etcd-server at 1=https://etcd2.k8s.com:100
Nov  7 13:26:54 node01 etcd: got bootstrap from DNS for etcd-server at 2=https://etcd3.k8s.com:100
Nov  7 13:26:54 node01 etcd: error setting up initial cluster: cannot find local etcd member "etcd1" in SRV records
Nov  7 13:26:54 node01 systemd: etcd.service: main process exited, code=exited, status=1/FAILURE
Nov  7 13:26:54 node01 systemd: Failed to start Etcd Server.
Nov  7 13:26:54 node01 systemd: Unit etcd.service entered failed state.

But I used dnsmasq Configuration SRV , it is works

`/etc/dnsmasq_hosts

192.168.1.91 etcd1 etcd1.k8s.com
192.168.1.92 etcd2 etcd2.k8s.com
192.168.1.93 etcd3 etcd3.k8s.com

/etc/dnsmasq.conf

srv-host=_etcd-server-ssl._tcp.k8s.com,etcd1.k8s.com,2380,0,100
srv-host=_etcd-server-ssl._tcp.k8s.com,etcd2.k8s.com,2380,0,100
srv-host=_etcd-server-ssl._tcp.k8s.com,etcd3.k8s.com,2380,0,100
srv-host=_etcd-client-ssl._tcp.k8s.com,etcd1.k8s.com,2380,0,100
srv-host=_etcd-client-ssl._tcp.k8s.com,etcd2.k8s.com,2380,0,100
srv-host=_etcd-client-ssl._tcp.k8s.com,etcd3.k8s.com,2380,0,100

Etcd config file Same as above Bind

BlackSunday001 commented 5 years ago

I am running etcd (version 3.3.11)

rpm -qa etcd
etcd-3.3.11-2.el7.centos.x86_64
BlackSunday001 commented 5 years ago

Everything is ok.

Is my configuration problem

bind SRV Analysis correct:

_etcd-server._tcp.k8s.com.  IN      SRV     10    10       2380     etcd1
_etcd-server._tcp.k8s.com.  IN      SRV     10    10       2380     etcd2
_etcd-server._tcp.k8s.com.  IN      SRV     10    10       2380     etcd3
_etcd-client._tcp.k8s.com.  IN      SRV     10    10       2379     etcd1
_etcd-client._tcp.k8s.com.  IN      SRV     10    10       2379     etcd2
_etcd-client._tcp.k8s.com.  IN      SRV     10    10       2379     etcd3

2380 is etcd server port 2379 is other Client link port

Thinks

BlackSunday001 commented 5 years ago

Thanks