Closed cookandy closed 7 years ago
right.
So basically we use frameworks images like:
https://github.com/eBayClassifiedsGroup/PanteraS/tree/master/frameworks
that provides that functionality.
I think the best description how it works is to try step-by-step this example: https://github.com/eBayClassifiedsGroup/PanteraS/tree/master/examples/SmoothWebappPython
docker exec -ti 9670e6203c2c bash
bash-4.3# ps axuf
PID USER TIME COMMAND
1 root 0:00 {start.sh} /bin/bash /usr/local/bin/start.sh cd /opt/web/ && python3 -m http.server --cgi
6 root 0:00 {start.sh} /bin/bash /usr/local/bin/start.sh cd /opt/web/ && python3 -m http.server --cgi
7 root 0:00 python3 -m http.server --cgi
8 root 0:00 bash
13 root 0:00 ps axuf
cat /usr/local/bin/start.sh
, you are interested in that part:
maintenance(){
# Container name will be provided by mesos:
MESOS_CONTAINER_NAME=${MESOS_CONTAINER_NAME:-$CONTAINER_NAME}
# Mesos provides external ports in coma separated $PORTS
for port in $(sed 's/,/ /g'<<<${PORTS})
do
# For each extenal ports you can map internal one from PORT_${int}
port_int=$(env|sed -n "s/PORT_\([0-9]*\)=$port/\1/p")
# registartor use ServiceID that contains variables, which are now all available
consul_service_id="${HOST%%.*}:${MESOS_CONTAINER_NAME}:${port_int}"
curl -X PUT "http://${HOST}:8500/v1/agent/service/maintenance/${consul_service_id}?enable=true"
# if you use udp uncomment also the udp to switch into maintenance mode
#curl -X PUT "http://${HOST}:8500/v1/agent/service/maintenance/${consul_service_id}:udp?enable=true"
done
}
trap 'maintenance && sleep 2 && kill -TERM $PID $PID_CUSTOM' TERM INT
6. BUT do not run it , run that small part instead inside container:
for port in $(sed 's/,/ /g'<<<${PORTS}) do portint=$(env|sed -n "s/PORT([0-9])=$port/\1/p") consul_service_id="${HOST%%.}:${MESOS_CONTAINER_NAME}:${port_int}" echo curl -X PUT "http://${HOST}:8500/v1/agent/service/maintenance/${consul_service_id}?enable=true" done
You should see sth like this:
7. run this curl few times with `enable=true` or `enable=false` and verify if service is going to maintenance mode and back or not - it should be orange in consul and green when back.
- If not verify your DNS if `$HOST` (`your_host`) from the curl above is resolvable inside container!
Thats very important!
8. If all fine, the trap is ready and will catch KILL signal and deregister the service (put into maintenance mode) before it is being killed - so it will be taken out from load balance first.
9. You can play now with scale up and down, and test
while true; do curl -H 'Host: python-smooth.service.consul' http://
should always gives you back something - without 503
Please try and if you have any questions just ask!
Thanks for the reply @sielaq! Very helpful as always.
I am running into a problem with the curl command. When I run the for
loop, I get back this command:
curl -X PUT http://10.134.26.172:8500/v1/agent/service/maintenance/10:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S7.d0b28555-7902-4077-8215-cb5e737bf2b7:8000?enable=true
However, when I run that curl command, I get back this error:
No service registered with ID "10:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S7.d0b28555-7902-4077-8215-cb5e737bf2b7:8000"
I can definitely resolve my ${HOST}
from inside the container. Any ideas why consul would report back no registered service? Is it to do with the consul_service_id
and ${HOST%%.*}
only returning 10:
?
I think the problem is that when I look at consul UI, I see the service listed as:
service:my-host-name-01:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S7.d0b28555-7902-4077-8215-cb5e737bf2b7:8000?enable=true
Yet, the curl command is only returning the first octet of the IP, 10::mesos-983....
, instead of the hostname.
Unfortunately using the whole IP as the service name doesn't work either, I get the No service registered with ID
error.
When I run the command with the my-host-name
, for example:
curl -X PUT http://10.134.26.172:8500/v1/agent/service/maintenance/my-host-name:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S7.d0b28555-7902-4077-8215-cb5e737bf2b7:8000?enable=true
. I get back an empty response (which I think is good).
However, running enable=true
and enable=false
makes no difference.
Any ideas why the curl command is returning the IP address instead of hostname? I use the following command to start consul:
agent -client=10.134.26.172 -advertise=10.134.26.172 -bind=10.134.26.172 -data-dir=/opt/consul/data -ui -node=10.134.26.172 -dc=DC1 -domain consul -server -join=10.134.22.239 -join=10.134.23.87 -join=10.134.26.121
curl -X PUT http://10.134.26.172:8500/v1/agent/service/maintenance/my-host-name:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S7.d0b28555-7902-4077-8215-cb5e737bf2b7:8000?enable=true. I get back an empty response (which I think is good).
However, running enable=true and enable=false makes no difference
We are very close, yes you are right that the problem is $HOST - we used to use name you use IP.
How do you know that it makes no difference ?
you should get empty response, but in UI you should see it marked as Service Maintenance Mode
and
status failing (orange in UI)
Oh, my mistake- it does work. I thought enable=false
would put it into Service Maintenance Mode
, but it is actually enable=true
.
So now you need to have your own script that is using a different variable not $HOST
or start using fqdn or hostname instead of IP, then
you can always verify it inside container and compare it with consul UI:
consul_service_id="${HOST%%.*}:${MESOS_CONTAINER_NAME}:${port_int}"
echo $consul_service_id
I wonder why you have an IP in $HOST ?
So I need to figure out how to register my services in consul with the IP address, instead of the hostname, or somehow get the hostname of the machine from the IP. Where does $HOST
come from?
In my PanteraS env file I have:
CONSUL_IP=10.134.26.172
HOST_IP=10.134.26.172
Where does $HOST come from?
it is injected by marathon to container
or mesos , let me check...
I think maybe it comes from the Mesos slave. BTW, I don't have any name resolution between my masters and slaves, which is why I've used IP addresses in all of my environment file. Even my MESOS_SLAVE_APP_PARAMS
is configured to use --hostname=10.134.26.172
.
Is there a way to register the service in Consul with the IP address, instead of hostname? Then I could just modify the ${HOST%%.*}
part of the variable...
exactly this comes from mesos-slave --hostname
. I have just checked that.
if you use a hostname (add into /etc/hosts so this IP will match hostname) then all should be fine.
Unfortunately registrator is using hostname, you can check with its options might be it is possible to change it
if you use a hostname (add into /etc/hosts so this IP will match hostname) then all should be fine.
But then I'd need to map /etc/hosts
to my container via marathon, correct?
If it is registrator, I would think using the -ip
flag would work since the documentation says Force IP address used for registering services
.
By default, when registering a service, Registrator will assign the service address by attempting to resolve the current hostname. If you would like to force the service address to be a specific address, you can specify the -ip argument.
http://gliderlabs.com/registrator/latest/user/run/#registrator-options
nope - we already use ip flag :(
Can you confirm that you're seeing the hostname in the ServiceID
, instead of IP address?
Yes confirm, as I said:
when
mesos-slave --hostname=paasslave001
I see
ServiceID": "paasslave001:mesos-361ed1a9-6409-41f0-8e39-f846582ec1a4-S8.4d287337-1ff0-4573-9ded-dc7eedefc0b8:8080",
when mesos-slave --hostname=10.0.0.100
I see IP then like:
ServiceID": "10.0.0.100:mesos-361ed1a9-6409-41f0-8e39-f846582ec1a4-S0.02eb068a-f66a-42b7-a950-e98baa848452:8080",
Really? That's strange... when I use:
MESOS_SLAVE_APP_PARAMS=--master=zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --hostname=10.134.26.172 --ip=10.134.26.172 --docker_stop_timeout=5secs --gc_delay=1days --docker_socket=/tmp/docker.sock --no-systemd_enable_support --work_dir=/tmp/mesos
I still see this in consul:
"ServiceID": "paas-slave-01:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S3.146058e1-f928-49a2-86c9-43df476530f5:8000"
Is it because you are using --hostname 10.0.0.100
instead of --hostname=10.0.0.100
(missing =
)?
Actually, here is my entire env
file used by the PanteraS container:
CONSUL_IP=10.134.26.172
HOST_IP=10.134.26.172
LISTEN_IP=10.134.26.172
FQDN=paas-slave-01
GOMAXPROCS=4
TYPE=slave
MASTER_COUNT=3
START_CONSUL=true
START_CONSUL_TEMPLATE=true
START_DNSMASQ=true
START_MESOS_MASTER=false
START_MARATHON=false
START_MESOS_AGENT=true
START_REGISTRATOR=true
START_ZOOKEEPER=false
START_CHRONOS=false
START_FABIO=false
START_NETDATA=false
HAPROXY_SSL=false
CONSUL_APP_PARAMS=agent -client=10.134.26.172 -advertise=10.134.26.172 -bind=10.134.26.172 -data-dir=/opt/consul/data -ui -node=10.134.26.172 -dc=DC1 -domain consul -server -join=10.134.22.239 -join=10.134.23.87 -join=10.134.26.121
CONSUL_DOMAIN=consul
CONSUL_TEMPLATE_APP_PARAMS=-consul=10.134.26.172:8500 -template haproxy.cfg.ctmpl:/etc/haproxy/haproxy.cfg:/opt/consul-template/haproxy_reload.sh -max-stale=0
DNSMASQ_APP_PARAMS=-d -u dnsmasq -r /etc/resolv.conf.orig -7 /etc/dnsmasq.d --server=/consul/10.134.26.172#8600 --host-record=sf1-paas-slave-01,10.134.26.172 --address=/consul/10.134.26.172
HAPROXY_ADD_DOMAIN=
HAPROXY_CERT_OPTS=
MARATHON_APP_PARAMS=--master zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/mesos --zk zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/marathon --hostname 10.134.26.172 --no-logger --http_address 10.134.26.172 --https_address 10.134.26.172
MESOS_MASTER_APP_PARAMS=--zk=zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/mesos --work_dir=/var/lib/mesos --quorum=2 --ip=10.134.26.172 --hostname=10.134.26.172 --cluster=mesoscluster
MESOS_SLAVE_APP_PARAMS=--master=zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --hostname=10.134.26.172 --ip=10.134.26.172 --docker_stop_timeout=5secs --gc_delay=1days --docker_socket=/tmp/docker.sock --no-systemd_enable_support --work_dir=/tmp/mesos
REGISTRATOR_APP_PARAMS=-cleanup -ip=10.134.26.172 consul://10.134.26.172:8500
ZOOKEEPER_APP_PARAMS=start-foreground
ZOOKEEPER_HOSTS=10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181
ZOOKEEPER_ID=1
KEEPALIVED_VIP=
CHRONOS_APP_PARAMS=--master zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/mesos --zk_hosts 10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181 --hostname 10.134.26.172 --http_address 10.134.26.172 --http_port 4400
FABIO_APP_PARAMS=-cfg ./fabio.properties -registry.consul.addr 10.134.26.172:8500
NETDATA_APP_PARAMS=-nd -ch /host
HOSTNAME=paas-slave-01
The only place I reference the hostname is on HOSTNAME
and FQDN
. Do you have an example of your env
file?
Is it because you are using --hostname 10.0.0.100 instead of --hostname=10.0.0.100 (missing =)?
aah I got it now, is it really that? let me check EDIT: naah without '=' it doesn't works at all... my previous message was not copy paste directly... you get me wrong :)
It definitely seems to be registrator creating that entry incorrectly:
registrator stderr | 2016/11/30 20:05:50 added: 274744eaa299 paas-slave-05:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S21.c7187f91-5b2b-4ec1-b239-58065cf43616:8000
for me it looks good. I still do not understand why you can't use real name like:
MESOS_SLAVE_APP_PARAMS=--master=zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --hostname=paas-slave-01 --ip=10.134.26.172 --docker_stop_timeout=5secs --gc_delay=1days --docker_socket=/tmp/docker.sock --no-systemd_enable_support --work_dir=/tmp/mesos
Really? When you use --hostname=10.0.0.100
you see the Consul ServiceID with the IP address?
For example:
10.0.0.100:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S21.c7187f91-5b2b-4ec1-b239-58065cf43616:8000
??
I still do not understand why you can't use real name
I have no DNS between my slaves, which means I would need to add an entry in /etc/hosts
to resolve the $HOST
value. I would then need to map /etc/hosts
to each of my containers, which often causes problems with my apps.
When I use --hostname=10.134.26.172
, my $HOST
value is 10.134.26.172
, but the ServiceID in Consul still somehow gets the hostname, for example:
paas-slave-01:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S21.c7187f91-5b2b-4ec1-b239-58065cf43616:8000
When I use --hostname=paas-slave-01
, my $HOST
value is paas-slave-01
, but the curl will fail because it can't resolve the name.
I'm really struggling to figure out why you see the ServiceID
with IP address, and not the hostname...
^^ when you look at the consul web UI (or call the API with curl http://<ip>:8500/v1/catalog/service/<service>?pretty=true
) you see the IP listed in the ServiceID
?
Even when I use --hostname=10.134.26.172
I get a serviceID containing the hostname:
[
{
"Node": "10.134.26.172",
"Address": "10.134.26.172",
"TaggedAddresses": {
"lan": "10.134.26.172",
"wan": "10.134.26.172"
},
"ServiceID": "paas-slave-01:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S35.1387bdd3-1cae-4493-b56a-97f28209d941:8000",
"ServiceName": "content-providers",
"ServiceTags": [
"content-providers",
"haproxy",
"haproxy_weight=100",
"haproxy_httpchk=GET /providers/universal"
],
"ServiceAddress": "10.134.26.172",
"ServicePort": 31208,
"ServiceEnableTagOverride": false,
"CreateIndex": 740298,
"ModifyIndex": 740325
}
]
Can you confirm your registrator settings look similar to mine?
REGISTRATOR_APP_PARAMS=-cleanup -ip=10.134.26.172 consul://10.134.26.172:8500
It seems like somehow registrator is still using the hostname when it registers the service:
registrator stderr | 2016/11/30 20:05:50 added: 274744eaa299 paas-slave-05:mesos-9832c5fa-0031-4189-aa82-d9381128bb01-S21.c7187f91-5b2b-4ec1-b239-58065cf43616:8000
For what it's worth, I also tried with -ip 10.134.26.172
(no =
) for REGISTRATOR_APP_PARAMS
, but the service still got registered with the hostname.
When you use --hostname=10.0.0.100 you see the Consul ServiceID with the IP address
that part I understand and we have discussed that already. I just don't understand why you can set up DNS to resolve your slaves!
that part I understand and we have discussed that already
The part I don't understand is why you're seeing ServiceID
with the IP address, and I am seeing it with the hostname. This documentation makes it sound like the ID can only contain the hostname, so I'm confused.
I just don't understand why you can set up DNS to resolve your slaves!
I didn't want to have another service to manage. If this can be done via the included DNSMasq, maybe that is a better option. But I think DNSMasq is only forwarding to Consul, so that wouldn't work - right?
The part I don't understand
This by regitsrator design either you gonna deal with it or require a fix.
Just to make sure, I see and I use hostname in ServiceID
-> mesos-slave --hostname=paasslave001
.
The IP
I only saw when I have tested mesos-slave --hostname=10.0.0.100
I didn't want to have another service to manage.
Just try the IP and hostname in /etc/hosts.
moreover we have especially informed about that https://github.com/eBayClassifiedsGroup/PanteraS/blob/master/generate_yml.sh#L20
This by regitsrator design either you gonna deal with it or require a fix.
If you are seeing IP
in ServiceID
when using --hostname=10.0.0.100
, then I shouldn't need a fix - it sounds like it works fine for you. But I have never seen IP in the ServiceID
, even when using mesos-agent --hostname=<IP>
.
Are you using an older version of mesos? I am using a newer version where mesos-slave
has been renamed to mesos-agent
root@paas-slave-02:~# ps ax | grep mesos-agent
16262 ? Sl 4:08 mesos-agent --master=zk://10.134.22.239:2181,10.134.23.87:2181,10.134.26.121:2181/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --hostname=10.134.26.178 --ip=10.134.26.178 --docker_stop_timeout=5secs --gc_delay=1days --docker_socket=/tmp/docker.sock --no-systemd_enable_support --work_dir=/tmp/mesos
Just try the IP and hostname in /etc/hosts.
This doesn't work. I can resolve the hostname just fine outside of the container, but once I get in the container, I cannot. Not unless I map /etc/hosts
as an external volume. But that sometimes causes problems as I mentioned, because it overwrites the internal /etc/hosts
file in the container, which contains the docker-specific host entry - for example:
172.17.0.2 503bbde45761
I think one solution might be to configure my slaves to use itself for DNS, instead of the masters. Because DNSMasq is using --host-record=paas-slave-02,10.134.26.178
, it will create a record in the local DNSMasq instance. Therefore, if I add my own slave to /etc/resolv.conf
I should be able to resolve my own hostname from inside the container...
when mesos-slave --hostname=10.0.0.100 I see IP then like:
ServiceID": "10.0.0.100:mesos-361ed1a9-6409-41f0-8e39-f846582ec1a4-S0.02eb068a-f66a-42b7-a950-e98baa848452:8080",
I'd really like to figure out why you're seeing the IP address in ServiceID
, and I am not.
Can you login the the running container like on the beginning
docker exec -ti 9670e6203c2c bash
and check by env
what kind of env variable exists that is the same like hostname on the native host (or contains the FQDN ?
might be new mesos creates a different variables now?
Unfortunately no such variable exists.
I kinda came up with a dirty hack by mapping /etc/hostname
to /etc/hostname.orig
in my marathon deploy, and then I modified your script to use a new ${HOSTNAME}
variable:
maintenance(){
# Container name will be provided by mesos:
MESOS_CONTAINER_NAME=${MESOS_CONTAINER_NAME:-$CONTAINER_NAME}
HOSTNAME=$(cat /etc/hostname.orig)
# Mesos provides external ports in coma separated $PORTS
for port in $(sed 's/,/ /g'<<<${PORTS})
do
# For each extenal ports you can map internal one from PORT_${int}
port_int=$(env|sed -n "s/PORT_\([0-9]*\)=$port/\1/p")
# registartor use ServiceID that contains variables, which are now all available
consul_service_id="${HOSTNAME}:${MESOS_CONTAINER_NAME}:${port_int}"
curl -X PUT "http://${HOST}:8500/v1/agent/service/maintenance/${consul_service_id}?enable=true"
# if you use udp uncomment also the udp to switch into maintenance mode
#curl -X PUT "http://${HOST}:8500/v1/agent/service/maintenance/${consul_service_id}:udp?enable=true"
done
}
Thanks for the help! Things would be much easier with DNS!
Hello,
I have a simple node application deployed via marathon. I am trying to find a way to gracefully shut down the application so that no open connections are disconnected. I see @sielaq has posted some info here: https://github.com/mesosphere/marathon/issues/712
Is there any way you could describe in a bit more detail how you are handling this situation? It appears you are using a wrapper script and some extra mesos slave arguments. Is this correct?