Closed trompx closed 9 years ago
Hi There,
first of all check if you can reach services like that:
curl -H 'Host: python.service.consul' 192.168.10.10
If this works, that means HAproxy is working properly. Everything is working fine.
The only thing you have to do now is set your DNS to point to 192.168.10.10 in /etc/resolv.conf
Hello !
Thank you very much for your fast answer. When curling :
curl -H 'Host: python.service.consul' 192.168.10.10
I get the "503 Service Unavailable No server is available to handle this request." message. But not instantly (looks like it is trying to do some things). When I try instead a non existant service :
curl -H 'Host: flask.service.consul' 192.168.10.10
I get the same 503 message but instantly.
The following command :
dig @192.168.10.10 -p8600 python.service.consul +tcp SRV
Output
; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> @192.168.10.10 -p8600 python.service.consul +tcp SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33100
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;python.service.consul. IN SRV
;; ANSWER SECTION:
python.service.consul. 0 IN SRV 1 1 31003 standalone.node.UNKNOWN.consul.
python.service.consul. 0 IN SRV 1 1 31002 standalone.node.UNKNOWN.consul.
;; ADDITIONAL SECTION:
standalone.node.UNKNOWN.consul. 0 IN A 192.168.10.10
standalone.node.UNKNOWN.consul. 0 IN A 192.168.10.10
;; Query time: 1 msec
;; SERVER: 192.168.10.10#8600(192.168.10.10)
;; WHEN: Thu May 21 15:24:57 UTC 2015
;; MSG SIZE rcvd: 273
The initial resolv.conf on my host has the following :
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3
I tried nameserver 127.0.0.1, nameserver 192.168.10.10 and nameserver 8.8.8.8 without success.
Edit : I copy the docker container /etc folder on the host to check the content of the resolv.conf of the docker container and it is empty.
kindly please give me 2 things, so I can reproduce it:
docker-compose.yml
file which was produced by generate_yml.sh
and kindly please
execute this: docker exec -ti <panteras_container_id_or_name> supervisorctl status
for example:
docker exec -ti panteras_panteras_1 supervisorctl status
so I will see which components are running.
I stumble upon this issue for empty resolv.conf on docker container : https://github.com/docker/docker/issues/9998 so the following command (instead of copy to the host) return the correct content (same as /etc/resolv.conf on host). One less problem.
sudo docker exec vagrant_panteras_1 cat /etc/resolv.conf
The content of the docker-compose.yml :
panteras:
dns: 192.168.10.10
image: orchestrator
name: panteras
net: host
privileged: true
environment:
CONSUL_IP: "192.168.10.10"
HOST_IP: "192.168.10.10"
GOMAXPROCS: "4"
START_CONSUL: "true"
START_CONSUL_TEMPLATE: "true"
START_DNSMASQ: "true"
START_HAPROXY: "true"
START_MESOS_MASTER: "true"
START_MARATHON: "true"
START_MESOS_SLAVE: "true"
START_REGISTRATOR: "true"
START_ZOOKEEPER: "true"
CONSUL_APP_PARAMS: "agent -client=0.0.0.0 -data-dir=/opt/consul/ -ui-dir=/opt/consul/dist/ -advertise=192.168.10.10 -node=standalone -dc=UNKNOWN -server -bootstrap-expect 1 "
CONSUL_TEMPLATE_APP_PARAMS: "-consul=192.168.10.10:8500 -template template.conf:/etc/haproxy/haproxy.cfg:/opt/consul-template/haproxy_reload.sh"
DNSMASQ_APP_PARAMS: "-d -u dnsmasq -r /etc/resolv.conf -7 /etc/dnsmasq.d --server=/consul/192.168.10.10#8600 --address=/consul/192.168.10.10 --host-record=standalone,192.168.10.10 "
HAPROXY_RELOAD_COMMAND: "/usr/sbin/haproxy -p /tmp/haproxy.pid -f /etc/haproxy/haproxy.cfg -sf $(pidof /usr/sbin/haproxy) || true"
MARATHON_APP_PARAMS: "--master zk://standalone:2181/mesos --zk zk://standalone:2181/marathon --hostname standalone "
MESOS_MASTER_APP_PARAMS: "--zk=zk://standalone:2181/mesos --work_dir=/var/lib/mesos --quorum=1 --ip=0.0.0.0 --hostname=standalone --cluster=mesoscluster "
MESOS_SLAVE_APP_PARAMS: "--master=zk://standalone:2181/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --hostname=standalone --docker_stop_timeout=5secs "
REGISTRATOR_APP_PARAMS: "-ip=192.168.10.10 consul://192.168.10.10:8500 "
ZOOKEEPER_APP_PARAMS: "start-foreground"
ZOOKEEPER_HOSTS: "standalone:2181"
ZOOKEEPER_ID: "0"
volumes:
- "/etc/resolv.conf:/etc/resolv.conf"
- "/var/spool/marathon/artifacts/store:/var/spool/store"
- "/var/run/docker.sock:/tmp/docker.sock"
- "/var/lib/docker:/var/lib/docker"
- "/sys:/sys"
- "/tmp/mesos:/tmp/mesos"
I built the image locally (renamed orchestrator) but I have the same issue with panteras image.
And the result of the supervisorctl status :
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.10.10
vagrant@standalone:/var2/etc$ sudo docker exec -ti vagrant_panteras_1 supervisorctl status
consul RUNNING pid 11, uptime 0:12:11
consul-template_haproxy RUNNING pid 14, uptime 0:12:11
dnsmasq RUNNING pid 10, uptime 0:12:11
haproxy_watcher RUNNING pid 399, uptime 0:12:04
marathon RUNNING pid 22, uptime 0:12:11
mesos-master RUNNING pid 16, uptime 0:12:11
mesos-slave RUNNING pid 26, uptime 0:12:11
registrator RUNNING pid 36, uptime 0:12:11
stdout RUNNING pid 9, uptime 0:12:11
zookeeper RUNNING pid 12, uptime 0:12:11
I tried with other nameserver too (sometimes resulting in mesos slave FATAL Exited too quickly (process log may have details but not always, wierd)
The FATAL Exited too quickly looks like to be another problem as I did not get it before. The log says :
mesos-slave stderr | Failed to perform recovery: Collect failed: Detected duplicate pid 502 for container 68f3479b-9f7d-498b-b4e3-a5a9553cea37
To remedy this do as follows:
Step 1: rm -f /tmp/mesos/meta/slaves/latest
This ensures slave doesn't recover old live executors.
Step 2: Restart the slave. mesos-slave stderr |
2015-05-21 16:41:26,335 INFO exited: mesos-slave (exit status 1; not expected)
2015-05-21 16:41:26,428 INFO gave up: mesos-slave entered FATAL state, too many start retries too quickly
2015-05-21 16:41:26,428 INFO reaped unknown pid 494
2015-05-21 16:41:26,429 INFO reaped unknown pid 495
2015-05-21 16:41:26,440 INFO reaped unknown pid 493
2015-05-21 16:41:26,441 INFO reaped unknown pid 496
2015-05-21 16:41:26,441 INFO reaped unknown pid 498
2015-05-21 16:41:26,444 INFO reaped unknown pid 497
2015-05-21 16:41:26,445 INFO reaped unknown pid 502
Edit : I was building the image in a custom script without the --rm=true and it looks like it is mandatory to not have duplicates (at least for now) https://medium.com/@paulcolomiets/evaluating-mesos-4a08f85473fb
Edit 2 : finally only reloading vagrant vm solved it
looks like you have some left over files from last mesos run / crash.
You can clean them up rm -f /tmp/mesos/*
if this is not production system and not relevant data.
Anyway. I suspect that you have started your vagrant box (I suspect vagrant box from IP address) in different network (vagrant has old network configuratoin like resolv.conf) and now you work in a different network.
Try to clean up.
docker-compose stop
, docker-compose rm --force
and rm -f /tmp/mesos/*
reboot vagrant box
and generate yml file again and run docker-compose up -d
again.
Great ! Now I get load balancing when I
curl -H 'Host: python.service.consul' 192.168.10.10
Thank you !
But in my browser I still get the 503 error. Do you mind posting what you have in your /etc/resolv.conf of your docker host ?
In your native host (not vagrant box) you need to point now to vagrant box:
# cat /etc/resolv.conf
nameserver 192.168.10.10
But it depends which system you have on your native host Mac/Linux/Windows.
I could not find what was the problem the past hours. I even tried to launch panteras on a digitalocean droplet (works just as fine as local btw :), the curl -H 'Host: python.service.consul' digital_ocean_droplet_ip is load balancing correctly but browsing to digital_ocean_droplet_ip results in a 503 error too...
I checked the /etc/resolv.conf, it has :
nameserver 8.8.4.4
nameserver 8.8.8.8
nameserver ip_different_from_digitalocean_droplet_ip
I am a bit lost, but will let you know as soon as I get it working !
I edited the haproxy.cfg file for further testing :
redirect location http://www.google.com if acl_{{$service_name}}
# use_backend backend_{{$service_name}} if acl_{{$service_name}}
I am not being redirected to google.com so the python backend is not even used. I will investigate why acl_python is false.
Hi again,
I got it working ! Being pretty new to sysadmin, I am still discovering how everything works. So after reading all the haproxy docs, I found out that in the haproxy rules set in the template.conf, we were trying to match the host name with the service (which could be anything relative to our website/backend/api).
So I just added on my host machine in the etc/hosts file the following : python.service.consul 192.168.10.10 and tadaaaaa ! load balancing working :)
So I guess I will have to make my services names match with my website urls to balance different services on different backends ?
Sorry for bothering you and again thank you for your help ! I really appreciate.
Xavier
I think you have too many DNS entries in /etc/resolv.conf. You need to understand that those entries are not smart. While resolving system check first grep hosts /etc/nsswitch.conf
which has for ex.:
hosts: files dns
That mean it will check first /etc/hosts
if cannot find then use DNS (which can be configured in /etc/resolv.conf). But, it first takes 8.8.8.8
which is google - and google has no idea about your setup and about this service. Sometimes it use a secondary nameserver 4.4.4.4
- and same - no result since there is nothing about your service in google dns. The 3rd entry, I think, is never used by system.
Which means, try ONLY with one entry, or stay with your workaround - static entry in /etc/hosts. I'm closing the ticket then :)
Thanks for the additional info ! The dns I posted were the default when I sshed to my digitalocean droplet so I don't really know why they set them like that (especially if only the 2 first are used).
Concerning consul services, you mean that I don't have to necessary set a etc/hosts linking the service name to the machine ip to make haproxy load balance ? I mean how would haproxy detects by browsing 192.168.10.10:80 (and not python.service.consul which resolves to 192.168.10.10 thanks to the etc/hosts entry I added) that it matches the python.service.consul service (which has 2 python instances at 192.168.10.10:31000 and at 92.168.10.10:3100)1 if it is not in the url in the first place ? (unless I did not get exactly how hdr(host) -i python.service.consul works)
ad 1. it depends what you want to achieve. If you want to have permanently setup. Then you have to replace it. But keep in mind that when container is down you will have to DNS at all.
ad 2. Yes you don't have. Thats why consul and DNS is for - to resolve it (you can use DNS routing) or you can use HAproxy as a proxy layer 7 (like apache vhosts) It detects from Host headers when you request it :) On every host you have (if you have multiple slaves of PanteraS) HAproxy gonna be running on port 80, which means that DNS will point to one of HAproxy that always will balanced between instances.
But you can also use consul DNS thingy, so your apps can access directly, more info: https://www.consul.io/docs/agent/dns.html
I struggled to understand how the service was gonna be detected in haproxy (how a webiste with name www.mysite.com with two A records pointing to the 2 panteras nodes could point to the python.service.consul for instance as hdr(host) does not includes 'python.service.consul' in it..) but I think I get it now thanks to your explanations. For instance I wanted to link my application to a database cluster like galera. So in my application I wanted to have a single entry point like :
host = galera.service.dc1.consul
Or even bind it to localhost in case dns get cached and a node goes down
host = 127.0.0.1
and have in haproxy a rule like bind 127.0.0.1:3306 with consul automatically registering the healthy nodes. I don't know it that makes sense...
This is the place where Orchestration comes in. You need to understand how marathon works. And you have to decide you want to deploy your application.
When you use 1. or 2. then all the registering / deregistering is being done for you. If you choose 3. or you want to play first before deploying. you have to register it in consul by your self - try first as external service: https://www.consul.io/docs/guides/external.html
The hole idea of panteras is that you don;t have to touch consul / haproxy at all.
Yes my plan is to backed my application before deploy and to redeploy new container when I change something. I'm currently in the process of creating procedures to deploy a new version of my application into production, depending on the changes made (simple frontend change, backend api changes or the hardest changes in databases), so multiple scenarios without downtime. Are you already using panteras in production ? Concerning your database clusters, are you linking your application directly to a service like redis/mongodb/elasticsearch.service.consul or bind to localhost like I plan to do ? Thanks again for your time, it is priceless !
If you know marathon API already we have a 2'nd project that helps with deployment of marathon apps.
https://github.com/eBayClassifiedsGroup/marathon_deploy
apt-get install ruby1.9.1-dev
gem install marathon_deploy
you can use it to store your JSON/YAML templates in git and deploy app from jenkins or any other building tool you have, and feed multiple marathons you have.
Yes, we have few microservices in production already - but we are in A/B testing mode.
I don't get last question. We do not use links. I'm also not quite sure what you mean by 'bind to localhost' - services that require disk writes?. Currently we use for microservies which are stateless. Any databases are separated from PaaS, and accessible via specific address. Those addresses can be register in consul too, but as external services. Alternatively you can store your configuration of your external services in consul K/V storage.
Thanks for the marathon_deploy suggesion, I looked at it and it may be of help!
Concerning the last question, it is especially like you mention how you access it and via each address. My application has a database config file where I can specify one entry point : host, user, password, database_name and at first I did not know how to do it with a cluster of redis/mysql/elasticsearch. So I have two solutions :
'host' => mysql.service.consul,
'database' => 'mydatabase',
'username' => 'myusername',
'password' => 'mypassword',
But as the video explains, the dns can get cached and in case of a node down, it will redirect to it resulting in an error.
'host' => 127.0.0.1,
'database' => 'mydatabase',
'username' => 'myusername',
'password' => 'mypassword',
'port' => 3306
Hello,
I tried to make this paas work in standalone mode. I launch all the containers Simple/SmoothWebappPyton (deploy0 & 1).
I get the following haproxy.cfg :
When I browse 192.168.10.10:31001 or 192.168.10.10:31002 I get the app output. However, when I try to browse 192.168.10.10:80 to be load balanced between the app nodes in a round robin manner, I get an "503 Service Unavailable No server is available to handle this request.".
I tried to debug but I was not getting the haproxy logs. I added to the haproxy.cfg the following :
Here is the logs I got if it may helps :
Are you experiencing the same problem ? Or maybe I have misconfigured some parts ?