eBayClassifiedsGroup / PanteraS

PanteraS - PaaS - Platform as a Service in a box
GNU General Public License v2.0
199 stars 61 forks source link

Load Balancing Problem - HaProxy 503 Service Unavailable #49

Closed trompx closed 9 years ago

trompx commented 9 years ago

Hello,

I tried to make this paas work in standalone mode. I launch all the containers Simple/SmoothWebappPyton (deploy0 & 1).

I get the following haproxy.cfg :

frontend http-in
    bind *:80

    #python
    acl acl_python hdr(host) -i python.service.consul
    use_backend backend_python if acl_python

    #python-smooth
    acl acl_python-smooth hdr(host) -i python-smooth.service.consul
    use_backend backend_python-smooth if acl_python-smooth

    backend backend_python
        balance roundrobin
        option http-server-close
        server standalone_31005 192.168.10.10:31005 maxconn 32  weight 1
        server standalone_31003 192.168.10.10:31003 maxconn 32  weight 100
        server standalone_31002 192.168.10.10:31002 maxconn 32  weight 100
        server standalone_31006 192.168.10.10:31006 maxconn 32  weight 1

    backend backend_python-smooth
        balance roundrobin
        option http-server-close
        server standalone_31000 192.168.10.10:31000 maxconn 32  weight 100
        server standalone_31004 192.168.10.10:31004 maxconn 32  weight 1
        server standalone_31001 192.168.10.10:31001 maxconn 32  weight 100

When I browse 192.168.10.10:31001 or 192.168.10.10:31002 I get the app output. However, when I try to browse 192.168.10.10:80 to be load balanced between the app nodes in a round robin manner, I get an "503 Service Unavailable No server is available to handle this request.".

I tried to debug but I was not getting the haproxy logs. I added to the haproxy.cfg the following :

global
    debug

defaults
    log global
    option tcplog

Here is the logs I got if it may helps :

panteras_1 | consul-template_haproxy stdout | 00000256:http-in.accept(0006)=0007 from [192.168.10.1:51926]
panteras_1 | consul-template_haproxy stdout | 00000257:http-in.accept(0006)=0008 from [192.168.10.1:51927]
panteras_1 | consul-template_haproxy stdout | 00000256:http-in.clireq[0007:ffffffff]: GET / HTTP/1.1
panteras_1 | 00000256:http-in.clihdr[0007:ffffffff]: Host: 192.168.10.10
panteras_1 | 00000256:http-in.clihdr[0007:ffffffff]: Connection: keep-alive
panteras_1 | 00000256:http-in.clihdr[0007:ffffffff]: Cache-Control: max-age=0
panteras_1 | 00000256:http-in.clihdr[0007:ffffffff]: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
panteras_1 | 00000256:http-in.clihdr[0007:ffffffff]: User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36
panteras_1 | 00000256:http-in.clihdr[0007:ffffffff]: Accept-Encoding: gzip, deflate, sdch
panteras_1 | 00000256:http-in.clihdr[0007:ffffffff]: Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4,es;q=0.2
panteras_1 | 00000256:http-in.clicls[0007:ffffffff]
panteras_1 | 00000256:http-in.closed[0007:ffffffff]
panteras_1 | consul-template_haproxy stdout | 00000257:http-in.clireq[0008:ffffffff]: GET /favicon.ico HTTP/1.1
panteras_1 | 00000257:http-in.clihdr[0008:ffffffff]: Host: 192.168.10.10
panteras_1 | 00000257:http-in.clihdr[0008:ffffffff]: Connection: keep-alive
panteras_1 | 00000257:http-in.clihdr[0008:ffffffff]: User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36
panteras_1 | 00000257:http-in.clihdr[0008:ffffffff]: Accept: */*
panteras_1 | 00000257:http-in.clihdr[0008:ffffffff]: Referer: http://192.168.10.10/
panteras_1 | 00000257:http-in.clihdr[0008:ffffffff]: Accept-Encoding: gzip, deflate, sdch
panteras_1 | 00000257:http-in.clihdr[0008:ffffffff]: Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4,es;q=0.2
panteras_1 | 00000257:http-in.clicls[0008:ffffffff]
panteras_1 | 00000257:http-in.closed[0008:ffffffff]
panteras_1 | consul-template_haproxy stdout | 00000258:http-in.accept(0006)=0007 from [192.168.10.1:51928]
panteras_1 | haproxy_watcher stdout | 0000023d:http-in.accept(0006)=0007 from [192.168.10.1:51929]
panteras_1 | consul-template_haproxy stdout | 00000258:http-in.clireq[0007:ffffffff]: GET / HTTP/1.1
panteras_1 | 00000258:http-in.clihdr[0007:ffffffff]: Host: 192.168.10.10
panteras_1 | 00000258:http-in.clihdr[0007:ffffffff]: Connection: keep-alive
panteras_1 | 00000258:http-in.clihdr[0007:ffffffff]: Cache-Control: max-age=0
panteras_1 | 00000258:http-in.clihdr[0007:ffffffff]: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
panteras_1 | 00000258:http-in.clihdr[0007:ffffffff]: User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36
panteras_1 | 00000258:http-in.clihdr[0007:ffffffff]: Accept-Encoding: gzip, deflate, sdch
panteras_1 | 00000258:http-in.clihdr[0007:ffffffff]: Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4,es;q=0.2
panteras_1 | 00000258:http-in.clicls[0007:ffffffff]
panteras_1 | 00000258:http-in.closed[0007:ffffffff]
panteras_1 | haproxy_watcher stdout | 0000023d:http-in.clireq[0007:ffffffff]: GET /favicon.ico HTTP/1.1
panteras_1 | 0000023d:http-in.clihdr[0007:ffffffff]: Host: 192.168.10.10
panteras_1 | 0000023d:http-in.clihdr[0007:ffffffff]: Connection: keep-alive
panteras_1 | 0000023d:http-in.clihdr[0007:ffffffff]: User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36
panteras_1 | 0000023d:http-in.clihdr[0007:ffffffff]: Accept: */*
panteras_1 | 0000023d:http-in.clihdr[0007:ffffffff]: Referer: http://192.168.10.10/
panteras_1 | 0000023d:http-in.clihdr[0007:ffffffff]: Accept-Encoding: gzip, deflate, sdch
panteras_1 | 0000023d:http-in.clihdr[0007:ffffffff]: Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4,es;q=0.2
panteras_1 | haproxy_watcher stdout | 0000023d:http-in.clicls[0007:ffffffff]
panteras_1 | 0000023d:http-in.closed[0007:ffffffff]
panteras_1 | haproxy_watcher stdout | 0000023e:stats.accept(0004)=0007 from [192.168.10.1:51930]
panteras_1 | haproxy_watcher stdout | 0000023e:stats.clireq[0007:ffffffff]: GET / HTTP/1.1
panteras_1 | 0000023e:stats.clihdr[0007:ffffffff]: Host: 192.168.10.10:81
panteras_1 | 0000023e:stats.clihdr[0007:ffffffff]: Connection: keep-alive
panteras_1 | 0000023e:stats.clihdr[0007:ffffffff]: Cache-Control: max-age=0
panteras_1 | 0000023e:stats.clihdr[0007:ffffffff]: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
panteras_1 | 0000023e:stats.clihdr[0007:ffffffff]: User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36
panteras_1 | 0000023e:stats.clihdr[0007:ffffffff]: Accept-Encoding: gzip, deflate, sdch
panteras_1 | 0000023e:stats.clihdr[0007:ffffffff]: Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4,es;q=0.2
panteras_1 | 0000023e:stats.srvrep[0007:ffffffff]: HTTP/1.1 200 OK
panteras_1 | 0000023e:stats.srvhdr[0007:ffffffff]: Cache-Control: no-cache
panteras_1 | 0000023e:stats.srvhdr[0007:ffffffff]: Connection: close
panteras_1 | 0000023e:stats.srvhdr[0007:ffffffff]: Content-Type: text/html
panteras_1 | 0000023e:stats.srvhdr[0007:ffffffff]: Transfer-Encoding: chunked
panteras_1 | consul-template_haproxy stdout | 00000259:stats.accept(0004)=0007 from [192.168.10.1:51931]
panteras_1 | haproxy_watcher stdout | 0000023f:stats.clireq[0007:ffffffff]: GET /favicon.ico HTTP/1.1
panteras_1 | 0000023f:stats.clihdr[0007:ffffffff]: Host: 192.168.10.10:81
panteras_1 | 0000023f:stats.clihdr[0007:ffffffff]: Connection: keep-alive
panteras_1 | 0000023f:stats.clihdr[0007:ffffffff]: User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36
panteras_1 | 0000023f:stats.clihdr[0007:ffffffff]: Accept: */*
panteras_1 | 0000023f:stats.clihdr[0007:ffffffff]: Referer: http://192.168.10.10:81/
panteras_1 | 0000023f:stats.clihdr[0007:ffffffff]: Accept-Encoding: gzip, deflate, sdch
panteras_1 | 0000023f:stats.clihdr[0007:ffffffff]: Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4,es;q=0.2
panteras_1 | 0000023f:stats.srvrep[0007:ffffffff]: HTTP/1.1 200 OK
panteras_1 | 0000023f:stats.srvhdr[0007:ffffffff]: Cache-Control: no-cache
panteras_1 | 0000023f:stats.srvhdr[0007:ffffffff]: Connection: close
panteras_1 | 0000023f:stats.srvhdr[0007:ffffffff]: Content-Type: text/html
panteras_1 | 0000023f:stats.srvhdr[0007:ffffffff]: Transfer-Encoding: chunked

Are you experiencing the same problem ? Or maybe I have misconfigured some parts ?

sielaq commented 9 years ago

Hi There,

first of all check if you can reach services like that:

curl -H 'Host: python.service.consul' 192.168.10.10

If this works, that means HAproxy is working properly. Everything is working fine.

The only thing you have to do now is set your DNS to point to 192.168.10.10 in /etc/resolv.conf

trompx commented 9 years ago

Hello !

Thank you very much for your fast answer. When curling :

curl -H 'Host: python.service.consul' 192.168.10.10

I get the "503 Service Unavailable No server is available to handle this request." message. But not instantly (looks like it is trying to do some things). When I try instead a non existant service :

curl -H 'Host: flask.service.consul' 192.168.10.10

I get the same 503 message but instantly.

The following command :

dig @192.168.10.10 -p8600  python.service.consul +tcp SRV

Output

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> @192.168.10.10 -p8600 python.service.consul +tcp SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33100
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;python.service.consul.         IN      SRV

;; ANSWER SECTION:
python.service.consul.  0       IN      SRV     1 1 31003 standalone.node.UNKNOWN.consul.
python.service.consul.  0       IN      SRV     1 1 31002 standalone.node.UNKNOWN.consul.

;; ADDITIONAL SECTION:
standalone.node.UNKNOWN.consul. 0 IN    A       192.168.10.10
standalone.node.UNKNOWN.consul. 0 IN    A       192.168.10.10

;; Query time: 1 msec
;; SERVER: 192.168.10.10#8600(192.168.10.10)
;; WHEN: Thu May 21 15:24:57 UTC 2015
;; MSG SIZE  rcvd: 273

The initial resolv.conf on my host has the following :

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3

I tried nameserver 127.0.0.1, nameserver 192.168.10.10 and nameserver 8.8.8.8 without success.

Edit : I copy the docker container /etc folder on the host to check the content of the resolv.conf of the docker container and it is empty.

sielaq commented 9 years ago

kindly please give me 2 things, so I can reproduce it: docker-compose.yml file which was produced by generate_yml.sh

and kindly please

execute this: docker exec -ti <panteras_container_id_or_name> supervisorctl status for example:

docker exec -ti  panteras_panteras_1 supervisorctl status

so I will see which components are running.

trompx commented 9 years ago

I stumble upon this issue for empty resolv.conf on docker container : https://github.com/docker/docker/issues/9998 so the following command (instead of copy to the host) return the correct content (same as /etc/resolv.conf on host). One less problem.

sudo docker exec vagrant_panteras_1 cat /etc/resolv.conf

The content of the docker-compose.yml :

panteras:
  dns: 192.168.10.10
  image: orchestrator
  name: panteras
  net: host
  privileged: true

  environment:
    CONSUL_IP:               "192.168.10.10"
    HOST_IP:                 "192.168.10.10"
    GOMAXPROCS:              "4"

    START_CONSUL:            "true"
    START_CONSUL_TEMPLATE:   "true"
    START_DNSMASQ:           "true"
    START_HAPROXY:           "true"
    START_MESOS_MASTER:      "true"
    START_MARATHON:          "true"
    START_MESOS_SLAVE:       "true"
    START_REGISTRATOR:       "true"
    START_ZOOKEEPER:         "true"

    CONSUL_APP_PARAMS:          "agent  -client=0.0.0.0  -data-dir=/opt/consul/  -ui-dir=/opt/consul/dist/  -advertise=192.168.10.10  -node=standalone  -dc=UNKNOWN  -server  -bootstrap-expect 1  "
    CONSUL_TEMPLATE_APP_PARAMS: "-consul=192.168.10.10:8500  -template template.conf:/etc/haproxy/haproxy.cfg:/opt/consul-template/haproxy_reload.sh"
    DNSMASQ_APP_PARAMS:         "-d  -u dnsmasq  -r /etc/resolv.conf  -7 /etc/dnsmasq.d  --server=/consul/192.168.10.10#8600  --address=/consul/192.168.10.10  --host-record=standalone,192.168.10.10  "
    HAPROXY_RELOAD_COMMAND:     "/usr/sbin/haproxy -p /tmp/haproxy.pid -f /etc/haproxy/haproxy.cfg -sf $(pidof /usr/sbin/haproxy) || true"
    MARATHON_APP_PARAMS:        "--master zk://standalone:2181/mesos  --zk zk://standalone:2181/marathon  --hostname standalone  "
    MESOS_MASTER_APP_PARAMS:    "--zk=zk://standalone:2181/mesos  --work_dir=/var/lib/mesos  --quorum=1  --ip=0.0.0.0  --hostname=standalone  --cluster=mesoscluster  "
    MESOS_SLAVE_APP_PARAMS:     "--master=zk://standalone:2181/mesos  --containerizers=docker,mesos  --executor_registration_timeout=5mins  --hostname=standalone  --docker_stop_timeout=5secs  "
    REGISTRATOR_APP_PARAMS:     "-ip=192.168.10.10 consul://192.168.10.10:8500  "
    ZOOKEEPER_APP_PARAMS:       "start-foreground"
    ZOOKEEPER_HOSTS:            "standalone:2181"
    ZOOKEEPER_ID:               "0"

  volumes:
    - "/etc/resolv.conf:/etc/resolv.conf"
    - "/var/spool/marathon/artifacts/store:/var/spool/store" 
    - "/var/run/docker.sock:/tmp/docker.sock"
    - "/var/lib/docker:/var/lib/docker"
    - "/sys:/sys"
    - "/tmp/mesos:/tmp/mesos"

I built the image locally (renamed orchestrator) but I have the same issue with panteras image.

And the result of the supervisorctl status :

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.10.10
vagrant@standalone:/var2/etc$ sudo docker exec -ti vagrant_panteras_1 supervisorctl status
consul                           RUNNING   pid 11, uptime 0:12:11
consul-template_haproxy          RUNNING   pid 14, uptime 0:12:11
dnsmasq                          RUNNING   pid 10, uptime 0:12:11
haproxy_watcher                  RUNNING   pid 399, uptime 0:12:04
marathon                         RUNNING   pid 22, uptime 0:12:11
mesos-master                     RUNNING   pid 16, uptime 0:12:11
mesos-slave                      RUNNING   pid 26, uptime 0:12:11
registrator                      RUNNING   pid 36, uptime 0:12:11
stdout                           RUNNING   pid 9, uptime 0:12:11
zookeeper                        RUNNING   pid 12, uptime 0:12:11

I tried with other nameserver too (sometimes resulting in mesos slave FATAL Exited too quickly (process log may have details but not always, wierd)

trompx commented 9 years ago

The FATAL Exited too quickly looks like to be another problem as I did not get it before. The log says :

mesos-slave stderr | Failed to perform recovery: Collect failed: Detected duplicate pid 502 for container 68f3479b-9f7d-498b-b4e3-a5a9553cea37
To remedy this do as follows:
Step 1: rm -f /tmp/mesos/meta/slaves/latest
        This ensures slave doesn't recover old live executors.
Step 2: Restart the slave. mesos-slave stderr |
2015-05-21 16:41:26,335 INFO exited: mesos-slave (exit status 1; not expected)
2015-05-21 16:41:26,428 INFO gave up: mesos-slave entered FATAL state, too many start retries too quickly
2015-05-21 16:41:26,428 INFO reaped unknown pid 494
2015-05-21 16:41:26,429 INFO reaped unknown pid 495
2015-05-21 16:41:26,440 INFO reaped unknown pid 493
2015-05-21 16:41:26,441 INFO reaped unknown pid 496
2015-05-21 16:41:26,441 INFO reaped unknown pid 498
2015-05-21 16:41:26,444 INFO reaped unknown pid 497
2015-05-21 16:41:26,445 INFO reaped unknown pid 502

Edit : I was building the image in a custom script without the --rm=true and it looks like it is mandatory to not have duplicates (at least for now) https://medium.com/@paulcolomiets/evaluating-mesos-4a08f85473fb

Edit 2 : finally only reloading vagrant vm solved it

sielaq commented 9 years ago

looks like you have some left over files from last mesos run / crash. You can clean them up rm -f /tmp/mesos/* if this is not production system and not relevant data.

Anyway. I suspect that you have started your vagrant box (I suspect vagrant box from IP address) in different network (vagrant has old network configuratoin like resolv.conf) and now you work in a different network.

Try to clean up.

docker-compose stop, docker-compose rm --force and rm -f /tmp/mesos/* reboot vagrant box and generate yml file again and run docker-compose up -d again.

trompx commented 9 years ago

Great ! Now I get load balancing when I

curl -H 'Host: python.service.consul' 192.168.10.10

Thank you !

But in my browser I still get the 503 error. Do you mind posting what you have in your /etc/resolv.conf of your docker host ?

sielaq commented 9 years ago

In your native host (not vagrant box) you need to point now to vagrant box:

# cat /etc/resolv.conf 
nameserver 192.168.10.10

But it depends which system you have on your native host Mac/Linux/Windows.

trompx commented 9 years ago

I could not find what was the problem the past hours. I even tried to launch panteras on a digitalocean droplet (works just as fine as local btw :), the curl -H 'Host: python.service.consul' digital_ocean_droplet_ip is load balancing correctly but browsing to digital_ocean_droplet_ip results in a 503 error too...

I checked the /etc/resolv.conf, it has :

nameserver 8.8.4.4
nameserver 8.8.8.8
nameserver ip_different_from_digitalocean_droplet_ip

I am a bit lost, but will let you know as soon as I get it working !

trompx commented 9 years ago

I edited the haproxy.cfg file for further testing :

redirect location http://www.google.com if acl_{{$service_name}}
# use_backend backend_{{$service_name}} if acl_{{$service_name}}

I am not being redirected to google.com so the python backend is not even used. I will investigate why acl_python is false.

trompx commented 9 years ago

Hi again,

I got it working ! Being pretty new to sysadmin, I am still discovering how everything works. So after reading all the haproxy docs, I found out that in the haproxy rules set in the template.conf, we were trying to match the host name with the service (which could be anything relative to our website/backend/api).

So I just added on my host machine in the etc/hosts file the following : python.service.consul 192.168.10.10 and tadaaaaa ! load balancing working :)

So I guess I will have to make my services names match with my website urls to balance different services on different backends ?

Sorry for bothering you and again thank you for your help ! I really appreciate.

Xavier

sielaq commented 9 years ago

I think you have too many DNS entries in /etc/resolv.conf. You need to understand that those entries are not smart. While resolving system check first grep hosts /etc/nsswitch.conf which has for ex.:

hosts:          files dns

That mean it will check first /etc/hosts if cannot find then use DNS (which can be configured in /etc/resolv.conf). But, it first takes 8.8.8.8 which is google - and google has no idea about your setup and about this service. Sometimes it use a secondary nameserver 4.4.4.4 - and same - no result since there is nothing about your service in google dns. The 3rd entry, I think, is never used by system.

Which means, try ONLY with one entry, or stay with your workaround - static entry in /etc/hosts. I'm closing the ticket then :)

trompx commented 9 years ago

Thanks for the additional info ! The dns I posted were the default when I sshed to my digitalocean droplet so I don't really know why they set them like that (especially if only the 2 first are used).

Concerning consul services, you mean that I don't have to necessary set a etc/hosts linking the service name to the machine ip to make haproxy load balance ? I mean how would haproxy detects by browsing 192.168.10.10:80 (and not python.service.consul which resolves to 192.168.10.10 thanks to the etc/hosts entry I added) that it matches the python.service.consul service (which has 2 python instances at 192.168.10.10:31000 and at 92.168.10.10:3100)1 if it is not in the url in the first place ? (unless I did not get exactly how hdr(host) -i python.service.consul works)

sielaq commented 9 years ago

ad 1. it depends what you want to achieve. If you want to have permanently setup. Then you have to replace it. But keep in mind that when container is down you will have to DNS at all.

ad 2. Yes you don't have. Thats why consul and DNS is for - to resolve it (you can use DNS routing) or you can use HAproxy as a proxy layer 7 (like apache vhosts) It detects from Host headers when you request it :) On every host you have (if you have multiple slaves of PanteraS) HAproxy gonna be running on port 80, which means that DNS will point to one of HAproxy that always will balanced between instances.

But you can also use consul DNS thingy, so your apps can access directly, more info: https://www.consul.io/docs/agent/dns.html

trompx commented 9 years ago

I struggled to understand how the service was gonna be detected in haproxy (how a webiste with name www.mysite.com with two A records pointing to the 2 panteras nodes could point to the python.service.consul for instance as hdr(host) does not includes 'python.service.consul' in it..) but I think I get it now thanks to your explanations. For instance I wanted to link my application to a database cluster like galera. So in my application I wanted to have a single entry point like :

host = galera.service.dc1.consul

Or even bind it to localhost in case dns get cached and a node goes down

host = 127.0.0.1

and have in haproxy a rule like bind 127.0.0.1:3306 with consul automatically registering the healthy nodes. I don't know it that makes sense...

sielaq commented 9 years ago

This is the place where Orchestration comes in. You need to understand how marathon works. And you have to decide you want to deploy your application.

  1. With docker container - backed before deploy
  2. With docker container - backed on fly - injecting your app (for both of those we provide frameworks example with java) 3.Without docker - using other mechanisms - but your app need to register then by it self.

When you use 1. or 2. then all the registering / deregistering is being done for you. If you choose 3. or you want to play first before deploying. you have to register it in consul by your self - try first as external service: https://www.consul.io/docs/guides/external.html

The hole idea of panteras is that you don;t have to touch consul / haproxy at all.

trompx commented 9 years ago

Yes my plan is to backed my application before deploy and to redeploy new container when I change something. I'm currently in the process of creating procedures to deploy a new version of my application into production, depending on the changes made (simple frontend change, backend api changes or the hardest changes in databases), so multiple scenarios without downtime. Are you already using panteras in production ? Concerning your database clusters, are you linking your application directly to a service like redis/mongodb/elasticsearch.service.consul or bind to localhost like I plan to do ? Thanks again for your time, it is priceless !

sielaq commented 9 years ago

If you know marathon API already we have a 2'nd project that helps with deployment of marathon apps.

https://github.com/eBayClassifiedsGroup/marathon_deploy

apt-get install ruby1.9.1-dev
gem install marathon_deploy

you can use it to store your JSON/YAML templates in git and deploy app from jenkins or any other building tool you have, and feed multiple marathons you have.

Yes, we have few microservices in production already - but we are in A/B testing mode.

I don't get last question. We do not use links. I'm also not quite sure what you mean by 'bind to localhost' - services that require disk writes?. Currently we use for microservies which are stateless. Any databases are separated from PaaS, and accessible via specific address. Those addresses can be register in consul too, but as external services. Alternatively you can store your configuration of your external services in consul K/V storage.

trompx commented 9 years ago

Thanks for the marathon_deploy suggesion, I looked at it and it may be of help!

Concerning the last question, it is especially like you mention how you access it and via each address. My application has a database config file where I can specify one entry point : host, user, password, database_name and at first I did not know how to do it with a cluster of redis/mysql/elasticsearch. So I have two solutions :

  1. Set the consul service as host (demo here at 13min30 : https://www.youtube.com/watch?v=huvBEB3suoo&index=1&list=LLqdemxsymMZOp7H1ERr735g)
'host'      => mysql.service.consul,
'database'  => 'mydatabase',
'username'  => 'myusername',
'password'  => 'mypassword',

But as the video explains, the dns can get cached and in case of a node down, it will redirect to it resulting in an error.

  1. Set host to localhost and listen in haproxy on 127.0.0.1:3306 and use the mysql.service.consul backend (demo here at 17min45 : https://www.youtube.com/watch?v=huvBEB3suoo&index=1&list=LLqdemxsymMZOp7H1ERr735g)
'host'      => 127.0.0.1,
'database'  => 'mydatabase',
'username'  => 'myusername',
'password'  => 'mypassword',
'port' => 3306