Closed akamalov closed 7 years ago
Maybe related to https://github.com/coreos/etcd/pull/6365 (backported in v3.0.8)
/cc @heyitsanthony ?
@gyuho unlikely; that was turned into a warning for 3.0.9.
DNS is provided by Mesos-DNS and container coming online registers with Mesos-DNS which makes the container resolvable.
So the address isn't expected to resolve until after etcd boots? If so, that sounds similar to #6262
Thanks @heyitsanthony . I can see that a user disabled DNS healthcheck. Was it done as a parameter to ETCD ?? Where was it disabled, do you know by chance?
It looks like the symptom is exactly the same as in #6262. So, the container has to be fully up, and only then Mesos-DNS get it registered. Wondering if there is some type of a delay or retry can be used with ETCD so it can re-try the operation (just like ZooKeeper does) after a period of time? It did work like a champ on 2.x releases, but 3.x it no longer work with Marathon/Mesos with dynamic IP addresses. Also, we cannot use IP addresses and then to cluster update, because IP addresses are dynamically provided by SDN.
Thanks again,
Alex
I got this problem in v3.0.15
, too.
Did it the merge make it to 3.0.16 ?
@akamalov it wasn't backported to 3.0.x since it's a change in behavior which isn't exactly a bug fix.
Thanks @heyitsanthony. I tried to test https://github.com/coreos/etcd/issues/7008 (via Marathon), but deployment is exiting complaining at flags provided
Here is the Marathon JSON file I used to deploy:
{
"id": "/etcd-host",
"groups": [{
"id": "/etcd-host/sdncluster1",
"apps": [{
"id": "etcd1",
"cpus": 0.5,
"mem": 128,
"instances": 1,
"constraints": [
["hostname", "CLUSTER", "mslave1"]
],
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "akamalov/etcd-docker:3.1.7008",
"network": "HOST",
"portmappings": [{
"containerPort": 2380,
"hostPort": 22380,
"protocol": "tcp"
}, {
"containerPort": 2379,
"hostPort": 22379,
"protocol": "tcp"
}],
"privileged": false,
"parameters": [{
"key": "hostname",
"value": "etcd1.sdncluster1.etcd-host.marathon.mesos"
}, {
"key": "userns",
"value": "host"
}],
"forcePullImage": false
}
},
"args": [
"--advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379",
"--initial-cluster-state new",
"--initial-cluster-token etcd-cluster1",
"--initial-cluster etcd1.sdncluster1.etcd-host.marathon.mesos=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380,etcd2.sdncluster1.etcd-host.marathon.mesos=http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380,etcd3.sdncluster1.etcd-host.marathon.mesos=http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380",
"--initial-advertise-peer-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380"
]
}, {
"id": "etcd2",
"cpus": 0.5,
"mem": 128,
"instances": 1,
"constraints": [
["hostname", "CLUSTER", "mslave2"]
],
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "akamalov/etcd-docker:3.1.7008",
"network": "HOST",
"portmappings": [{
"containerPort": 2380,
"hostPort": 22380,
"protocol": "tcp"
}, {
"containerPort": 2379,
"hostPort": 22379,
"protocol": "tcp"
}],
"privileged": false,
"parameters": [{
"key": "hostname",
"value": "etcd2.sdncluster1.etcd-host.marathon.mesos"
}, {
"key": "userns",
"value": "host"
}],
"forcePullImage": false
}
},
"args": [
"--advertise-client-urls http://etcd2.sdncluster1.etcd-host.marathon.mesos:2379",
"--initial-cluster-token etcd-cluster1",
"--initial-cluster-state new",
"--initial-cluster etcd1.sdncluster1.etcd-host.marathon.mesos=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380,etcd2.sdncluster1.etcd-host.marathon.mesos=http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380,etcd3.sdncluster1.etcd-host.marathon.mesos=http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380",
"--initial-advertise-peer-urls http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380"
]
}, {
"id": "etcd3",
"cpus": 0.5,
"mem": 128,
"instances": 1,
"constraints": [
["hostname", "CLUSTER", "mslave3"]
],
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "akamalov/etcd-docker:3.1.7008",
"network": "HOST",
"portmappings": [{
"containerPort": 2380,
"hostPort": 22380,
"protocol": "tcp"
}, {
"containerPort": 2379,
"hostPort": 22379,
"protocol": "tcp"
}],
"privileged": false,
"parameters": [{
"key": "hostname",
"value": "etcd3.sdncluster1.etcd-host.marathon.mesos"
}, {
"key": "userns",
"value": "host"
}],
"forcePullImage": false
}
},
"args": [
"--advertise-client-urls http://etcd3.sdncluster1.etcd-host.marathon.mesos:2379",
"--initial-cluster-token etcd-cluster1",
"--initial-cluster-state new",
"--initial-cluster etcd1.sdncluster1.etcd-host.marathon.mesos=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2380,etcd2.sdncluster1.etcd-host.marathon.mesos=http://etcd2.sdncluster1.etcd-host.marathon.mesos:2380,etcd3.sdncluster1.etcd-host.marathon.mesos=http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380",
"--initial-advertise-peer-urls http://etcd3.sdncluster1.etcd-host.marathon.mesos:2380"
]
}]
}]
}
Here is the error:
[root@mslave1 executors]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
760788b1bdc4 akamalov/etcd-docker:3.1.7008 "etcd '--advertise-cl" 5 seconds ago Exited (2) 3 seconds ago mesos-d4a8dbc6-4b76-46d4-9fe9-8c93014e2008-S1.16e1f6ed-b897-40fb-b9c2-fb80883a38fd
fdcc2ce4f69a akamalov/etcd-docker:3.1.7008 "etcd '--advertise-cl" 10 seconds ago Exited (2) 7 seconds ago mesos-d4a8dbc6-4b76-46d4-9fe9-8c93014e2008-S1.df019344-ef85-479a-b8bd-5a676d443d77
[root@mslave1 executors]# docker logs 76
flag provided but not defined: -advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379
usage: etcd [flags]
start an etcd server
etcd --version
show the version of etcd
etcd -h | --help
show the help information about etcd
etcd --config-file
path to the server configuration file
[root@mslave1 executors]#
@akamalov Will args
automatically split into separate arguments on spaces? It seems like:
"args": [
"--advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379",
...
]
should be:
"args": [
"--advertise-client-urls",
"http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379",
...
]
(alternatively, "--advertise-client-urls=http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379" will work too if you want to save a line)
Otherwise the string will be treated as one argv string instead of two.
It's the difference between:
[anthony@etcd]$ ./bin/etcd "-advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379"
flag provided but not defined: -advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379
usage: etcd [flags]
start an etcd server
and
[anthony@etcd]$ ./bin/etcd -advertise-client-urls http://etcd1.sdncluster1.etcd-host.marathon.mesos:2379
2017-01-18 14:28:14.201116 I | etcdmain: etcd Version: 3.1.0-rc.1+git
2017-01-18 14:28:14.201216 I | etcdmain: Git SHA: 1613516
2017-01-18 14:28:14.201219 I | etcdmain: Go Version: go1.7.3
...
Greetings,
Environment
Problem
Deploying ETCD using Marathon/Mesos. Used to work fine under 2.0.10, but under 3.0.x it is exhibiting problems with host resolution. This is my JSON file for Marathon:
ETCD errors on container complaining about host resolution:
For some reason,
`netutil
is not being able to resolve the host provided:DNS is provided by Mesos-DNS and container coming online registers with Mesos-DNS which makes the container resolvable. For example, zookeeper was deployed in the same fashion:
Again, 2.0.10 I've deployed, works fine. It's 3.0.x I am having a problem with.
Any pointers ?
Thanks,
Alex