etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.56k stars 9.74k forks source link

ETCD: 2.3.x - netutil: could not resolve host #6952

Closed akamalov closed 7 years ago

akamalov commented 7 years ago

Greetings,

Environment

OS: RHEL 7.3
ETCD: etcd-3.0.15

Problem

Deploying ETCD using Marathon/Mesos. Used to work fine under 2.0.10, but under 3.0.x it is exhibiting problems with host resolution. This is my JSON file for Marathon:

{
        "id": "/etcdhost",
        "groups": [{

                "id": "/etcdhost/sdncluster1",
                "apps": [{
                        "id": "etcd1",
                        "cpus": 0.5,
                        "mem": 128,
                        "instances": 1,
                        "constraints": [
                                ["hostname", "CLUSTER", "mslave1"]
                        ],
                        "container": {
                                "type": "DOCKER",
                                "volumes": [],
                                "docker": {
                                        "image": "akamalov/docker-etcd:3.0.15",
                                        "network": "HOST",
                                        "portmappings": [{
                                                "containerPort": 2380,
                                                "hostPort": 33380,
                                                "protocol": "tcp"
                                        }, {
                                                "containerPort": 2379,
                                                "hostPort": 33379,
                                                "protocol": "tcp"
                                        }],
                                        "privileged": false,
                                        "parameters": [{
                                                "key": "hostname",
                                                "value": "etcd1.sdncluster1.etcdhost.marathon.mesos"
                                        }],
                                        "forcePullImage": false
                                }
                        },
                        "args": [
                                "--name etcd1.sdncluster1.etcdhost.marathon.mesos",
                                "--initial-cluster etcd1.sdncluster1.etcdhost.marathon.mesos=http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380,etcd2.sdncluster1.etcdhost.marathon.mesos=http://etcd2.sdncluster1.etcdhost.marathon.mesos:2380,etcd3.sdncluster1.etcdhost.marathon.mesos=http://etcd3.sdncluster1.etcdhost.marathon.mesos:2380","-initial-cluster-token mesos-cluster",
                                "--initial-advertise-peer-urls http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380",
                                "--advertise-client-urls http://etcd1.sdncluster1.etcdhost.marathon.mesos:2379",
                                "--initial-cluster-state new",
                                "--heartbeat-interval 500",
                                "--election-timeout 2500"
                        ]
                }, {
                        "id": "etcd2",
                        "cpus": 0.5,
                        "mem": 128,
                        "instances": 1,
                        "constraints": [
                                ["hostname", "CLUSTER", "mslave2"]
                        ],
                        "container": {
                                "type": "DOCKER",
                                "volumes": [],
                                "docker": {
                                        "image": "akamalov/docker-etcd:3.0.15",
                                        "network": "HOST",
                                        "portmappings": [{
                                                "containerPort": 2380,
                                                "hostPort": 33380,
                                                "protocol": "tcp"
                                        }, {
                                                "containerPort": 2379,
                                                "hostPort": 33379,
                                                "protocol": "tcp"
                                        }],
                                        "privileged": false,
                                        "parameters": [{
                                                "key": "hostname",
                                                "value": "etcd2.sdncluster1.etcdhost.marathon.mesos"
                                        }],
                                        "forcePullImage": false
                                }
                        },
                        "args": [
                                "--name etcd2.sdncluster1.etcdhost.marathon.mesos",
                                "--initial-cluster etcd1.sdncluster1.etcdhost.marathon.mesos=http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380,etcd2.sdncluster1.etcdhost.marathon.mesos=http://etcd2.sdncluster1.etcdhost.marathon.mesos:2380,etcd3.sdncluster1.etcdhost.marathon.mesos=http://etcd3.sdncluster1.etcdhost.marathon.mesos:2380","-initial-cluster-token mesos-cluster",
                                "--initial-advertise-peer-urls http://etcd2.sdncluster1.etcdhost.marathon.mesos:2380",
                                "--advertise-client-urls http://etcd2.sdncluster1.etcdhost.marathon.mesos:2379",
                                "--initial-cluster-state new",
                                "--heartbeat-interval 500",
                                "--election-timeout 2500"
                        ]
                }, {
                        "id": "etcd3",
                        "cpus": 0.5,
                        "mem": 128,
                        "instances": 1,
                        "constraints": [
                                ["hostname", "CLUSTER", "mslave3"]
                        ],
                        "container": {
                                "type": "DOCKER",
                                "volumes": [],
                                "docker": {
                                        "image": "akamalov/docker-etcd:3.0.15",
                                        "network": "HOST",
                                        "portmappings": [{
                                                "containerPort": 2380,
                                                "hostPort": 33380,
                                                "protocol": "tcp"
                                        }, {
                                                "containerPort": 2379,
                                                "hostPort": 33379,
                                                "protocol": "tcp"
                                        }],
                                        "privileged": false,
                                        "parameters": [{
                                                "key": "hostname",
                                                "value": "etcd3.sdncluster1.etcdhost.marathon.mesos"
                                        }],
                                        "forcePullImage": false
                                }
                        },
                        "args": [
                                "--name etcd3.sdncluster1.etcdhost.marathon.mesos",
                                "--initial-cluster etcd1.sdncluster1.etcdhost.marathon.mesos=http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380,etcd2.sdncluster1.etcdhost.marathon.mesos=http://etcd2.sdncluster1.etcdhost.marathon.mesos:2380,etcd3.sdncluster1.etcdhost.marathon.mesos=http://etcd3.sdncluster1.etcdhost.marathon.mesos:2380","-initial-cluster-token mesos-cluster",
                                "--initial-advertise-peer-urls http://etcd3.sdncluster1.etcdhost.marathon.mesos:2380",
                                "--advertise-client-urls http://etcd3.sdncluster1.etcdhost.marathon.mesos:2379",
                                "--initial-cluster-state new",
                                "--heartbeat-interval 500",
                                "--election-timeout 2500"
                        ]
                }]
        }]
}

ETCD errors on container complaining about host resolution:

[root@mslave1 executors]# docker logs 8f
Using default CLIENT_URLS (http://0.0.0.0:4001,http://0.0.0.0:2379)
Using default PEER_URLS (http://0.0.0.0:7001,http://0.0.0.0:2380)
Running '/bin/etcd -data-dir=/data -listen-peer-urls=http://0.0.0.0:7001,http://0.0.0.0:2380 -listen-client-urls=http://0.0.0.0:4001,http://0.0.0.0:2379 --name etcd1.sdncluster1.etcdhost.marathon.mesos --initial-cluster etcd1.sdncluster1.etcdhost.marathon.mesos=http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380,etcd2.sdncluster1.etcdhost.marathon.mesos=http://etcd2.sdncluster1.etcdhost.marathon.mesos:2380,etcd3.sdncluster1.etcdhost.marathon.mesos=http://etcd3.sdncluster1.etcdhost.marathon.mesos:2380 -initial-cluster-token mesos-cluster --initial-advertise-peer-urls http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380 --advertise-client-urls http://etcd1.sdncluster1.etcdhost.marathon.mesos:2379 --initial-cluster-state new --heartbeat-interval 500 --election-timeout 2500'
BEGIN ETCD OUTPUT

2016-12-06 15:37:33.994580 I | etcdmain: etcd Version: 3.0.15
2016-12-06 15:37:33.994776 I | etcdmain: Git SHA: fc00305
2016-12-06 15:37:33.994793 I | etcdmain: Go Version: go1.6.3
2016-12-06 15:37:33.994806 I | etcdmain: Go OS/Arch: linux/amd64
2016-12-06 15:37:33.994825 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
2016-12-06 15:37:33.995191 I | etcdmain: listening for peers on http://0.0.0.0:2380
2016-12-06 15:37:33.995268 I | etcdmain: listening for peers on http://0.0.0.0:7001
2016-12-06 15:37:33.995372 I | etcdmain: listening for client requests on 0.0.0.0:2379
2016-12-06 15:37:33.995426 I | etcdmain: listening for client requests on 0.0.0.0:4001
2016-12-06 15:37:34.225432 E | netutil: could not resolve host etcd1.sdncluster1.etcdhost.marathon.mesos:2380
2016-12-06 15:37:34.239653 I | etcdmain: stopping listening for client requests on 0.0.0.0:4001
2016-12-06 15:37:34.239738 I | etcdmain: stopping listening for client requests on 0.0.0.0:4001
2016-12-06 15:37:34.239771 I | etcdmain: stopping listening for peers on http://0.0.0.0:7001
2016-12-06 15:37:34.239799 I | etcdmain: stopping listening for peers on http://0.0.0.0:2380
2016-12-06 15:37:34.239830 I | etcdmain: --initial-cluster must include etcd1.sdncluster1.etcdhost.marathon.mesos=http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380 given --initial-advertise-peer-urls=http://etcd1.sdncluster1.etcdhost.marathon.mesos:2380
[root@mslave1 executors]# 

For some reason, `netutil is not being able to resolve the host provided:

2016-12-06 15:37:34.225432 E | netutil: could not resolve host etcd1.sdncluster1.etcdhost.marathon.mesos:2380

DNS is provided by Mesos-DNS and container coming online registers with Mesos-DNS which makes the container resolvable. For example, zookeeper was deployed in the same fashion:

[root@mslave1 default]# ping -c 3 zk2.zk.kafka.marathon.mesos
PING zk2.zk.kafka.marathon.mesos (192.168.124.134) 56(84) bytes of data.
64 bytes from 192.168.124.134: icmp_seq=1 ttl=60 time=1.26 ms
64 bytes from 192.168.124.134: icmp_seq=2 ttl=60 time=0.869 ms
64 bytes from 192.168.124.134: icmp_seq=3 ttl=60 time=0.434 ms

--- zk2.zk.kafka.marathon.mesos ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.434/0.854/1.260/0.338 ms
[root@mslave1 default]# 

Again, 2.0.10 I've deployed, works fine. It's 3.0.x I am having a problem with.

Any pointers ?

Thanks,

Alex

gyuho commented 7 years ago

Maybe related to https://github.com/coreos/etcd/pull/6365 (backported in v3.0.8)

/cc @heyitsanthony ?

heyitsanthony commented 7 years ago

@gyuho unlikely; that was turned into a warning for 3.0.9.

heyitsanthony commented 7 years ago

DNS is provided by Mesos-DNS and container coming online registers with Mesos-DNS which makes the container resolvable.

So the address isn't expected to resolve until after etcd boots? If so, that sounds similar to #6262

akamalov commented 7 years ago

Thanks @heyitsanthony . I can see that a user disabled DNS healthcheck. Was it done as a parameter to ETCD ?? Where was it disabled, do you know by chance?

akamalov commented 7 years ago

It looks like the symptom is exactly the same as in #6262. So, the container has to be fully up, and only then Mesos-DNS get it registered. Wondering if there is some type of a delay or retry can be used with ETCD so it can re-try the operation (just like ZooKeeper does) after a period of time? It did work like a champ on 2.x releases, but 3.x it no longer work with Marathon/Mesos with dynamic IP addresses. Also, we cannot use IP addresses and then to cluster update, because IP addresses are dynamically provided by SDN.

Thanks again,

Alex

armstrongli commented 7 years ago

I got this problem in v3.0.15, too.

akamalov commented 7 years ago

Did it the merge make it to 3.0.16 ?

heyitsanthony commented 7 years ago

@akamalov it wasn't backported to 3.0.x since it's a change in behavior which isn't exactly a bug fix.

akamalov commented 7 years ago

Thanks @heyitsanthony. I tried to test https://github.com/coreos/etcd/issues/7008 (via Marathon), but deployment is exiting complaining at flags provided

Here is the Marathon JSON file I used to deploy:

{
        "id": "/etcd-host",
        "groups": [{

                "id": "/etcd-host/sdncluster1",
                "apps": [{
                        "id": "etcd1",
                        "cpus": 0.5,
                        "mem": 128,
                        "instances": 1,
                        "constraints": [
                                ["hostname", "CLUSTER", "mslave1"]
                        ],
                        "container": {
                                "type": "DOCKER",
                                "volumes": [],
                                "docker": {
                                        "image": "akamalov/etcd-docker:3.1.7008",
                                        "network": "HOST",
                                        "portmappings": [{
                                                "containerPort": 2380,
                                                "hostPort": 22380,
                                                "protocol": "tcp"
                                        }, {
                                                "containerPort": 2379,
                                                "hostPort": 22379,
                                                "protocol": "tcp"
                                        }],
                                        "privileged": false,
                                        "parameters": [{
                                                "key": "hostname",
                                                "value": "etcd1.sdncluster1.etcd-host.marathon.mesos"
                                       }, {
                                                "key": "userns",
                                                "value": "host"
                                        }],
                                        "forcePullImage": false
                                }
                        },
                        "args": [
                                "--advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379",
                                "--initial-cluster-state new",
                                "--initial-cluster-token etcd-cluster1",
                                "--initial-cluster etcd1.sdncluster1.etcd-host.marathon.mesos=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380,etcd2.sdncluster1.etcd-host.marathon.mesos=http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380,etcd3.sdncluster1.etcd-host.marathon.mesos=http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380",
                                "--initial-advertise-peer-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380"
                        ]
                }, {
                        "id": "etcd2",
                        "cpus": 0.5,
                        "mem": 128,
                        "instances": 1,
                        "constraints": [
                                ["hostname", "CLUSTER", "mslave2"]
                        ],
                        "container": {
                                "type": "DOCKER",
                                "volumes": [],
                                "docker": {
                                        "image": "akamalov/etcd-docker:3.1.7008",
                                        "network": "HOST",
                                        "portmappings": [{
                                                "containerPort": 2380,
                                                "hostPort": 22380,
                                                "protocol": "tcp"
                                        }, {
                                                "containerPort": 2379,
                                                "hostPort": 22379,
                                                "protocol": "tcp"
                                        }],
                                        "privileged": false,
                                        "parameters": [{
                                                "key": "hostname",
                                                "value": "etcd2.sdncluster1.etcd-host.marathon.mesos"
                                       }, {
                                                "key": "userns",
                                                "value": "host"
                                        }],
                                        "forcePullImage": false
                                }
                        },
                        "args": [
                                "--advertise-client-urls http://etcd2.sdncluster1.etcd-host.marathon.mesos:2379",
                                "--initial-cluster-token etcd-cluster1",
                                "--initial-cluster-state new",
                                "--initial-cluster etcd1.sdncluster1.etcd-host.marathon.mesos=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380,etcd2.sdncluster1.etcd-host.marathon.mesos=http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380,etcd3.sdncluster1.etcd-host.marathon.mesos=http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380",
                                "--initial-advertise-peer-urls http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380"
                        ]
                }, {
                        "id": "etcd3",
                        "cpus": 0.5,
                        "mem": 128,
                        "instances": 1,
                        "constraints": [
                                ["hostname", "CLUSTER", "mslave3"]
                        ],
                        "container": {
                                "type": "DOCKER",
                                "volumes": [],
                                "docker": {
                                        "image": "akamalov/etcd-docker:3.1.7008",
                                        "network": "HOST",
                                        "portmappings": [{
                                                "containerPort": 2380,
                                                "hostPort": 22380,
                                                "protocol": "tcp"
                                        }, {
                                                "containerPort": 2379,
                                                "hostPort": 22379,
                                                "protocol": "tcp"
                                        }],
                                        "privileged": false,
                                        "parameters": [{
                                                "key": "hostname",
                                                "value": "etcd3.sdncluster1.etcd-host.marathon.mesos"
                                       }, {
                                                "key": "userns",
                                                "value": "host"
                                        }],
                                        "forcePullImage": false
                                }
                        },
                        "args": [
                                "--advertise-client-urls http://etcd3.sdncluster1.etcd-host.marathon.mesos:2379",
                                "--initial-cluster-token etcd-cluster1",
                                "--initial-cluster-state new",
                                "--initial-cluster etcd1.sdncluster1.etcd-host.marathon.mesos=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380,etcd2.sdncluster1.etcd-host.marathon.mesos=http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380,etcd3.sdncluster1.etcd-host.marathon.mesos=http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380",
                                "--initial-advertise-peer-urls http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380"
                        ]
                }]
        }]
}

Here is the error:

[root@mslave1 executors]# docker ps -a
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS                      PORTS               NAMES
760788b1bdc4        akamalov/etcd-docker:3.1.7008   "etcd '--advertise-cl"   5 seconds ago       Exited (2) 3 seconds ago                        mesos-d4a8dbc6-4b76-46d4-9fe9-8c93014e2008-S1.16e1f6ed-b897-40fb-b9c2-fb80883a38fd
fdcc2ce4f69a        akamalov/etcd-docker:3.1.7008   "etcd '--advertise-cl"   10 seconds ago      Exited (2) 7 seconds ago                        mesos-d4a8dbc6-4b76-46d4-9fe9-8c93014e2008-S1.df019344-ef85-479a-b8bd-5a676d443d77

[root@mslave1 executors]# docker logs 76
flag provided but not defined: -advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379
usage: etcd [flags]
       start an etcd server

       etcd --version
       show the version of etcd

       etcd -h | --help
       show the help information about etcd

       etcd --config-file
       path to the server configuration file

[root@mslave1 executors]#
heyitsanthony commented 7 years ago

@akamalov Will args automatically split into separate arguments on spaces? It seems like:

"args": [
    "--advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379",
    ...
]

should be:

"args": [
    "--advertise-client-urls",
    "http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379",
    ...
]

(alternatively, "--advertise-client-urls=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379" will work too if you want to save a line)

Otherwise the string will be treated as one argv string instead of two.

It's the difference between:

[anthony@etcd]$ ./bin/etcd "-advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379"
flag provided but not defined: -advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379
usage: etcd [flags]
       start an etcd server

and

[anthony@etcd]$ ./bin/etcd -advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379
2017-01-18 14:28:14.201116 I | etcdmain: etcd Version: 3.1.0-rc.1+git
2017-01-18 14:28:14.201216 I | etcdmain: Git SHA: 1613516
2017-01-18 14:28:14.201219 I | etcdmain: Go Version: go1.7.3
...