CiscoCloud / haproxy-consul

Dynamic haproxy configuration using consul
Apache License 2.0
168 stars 85 forks source link

Old instance stays alive #33

Open auguster opened 8 years ago

auguster commented 8 years ago

I had random issues of my load balancer returning 503 errors on available services. The configuration was ok, the server seemed fine. I couldn't grasp the cause of the issue until I noticed that several instances of haproxy were running in the container.

Here is a ps from within a failing container:

PID   USER     TIME   COMMAND
    1 root       0:00 {launch.sh} /bin/bash /launch.sh
    7 root       0:06 /usr/local/bin/consul-template -config /consul-template/config.d -log-level info -wait 2s:10s -consul 127.0.0.1:8500
   17 root       0:22 /usr/sbin/haproxy -D -p /var/run/haproxy.pid -f /haproxy/haproxy.cfg -sf
   21 root       0:00 /usr/sbin/haproxy -D -p /var/run/haproxy.pid -f /haproxy/haproxy.cfg -sf 17
   25 root       0:00 /usr/sbin/haproxy -D -p /var/run/haproxy.pid -f /haproxy/haproxy.cfg -sf 21
   26 root       0:00 sh
   31 root       0:00 ps

So the new instance, bearing the correct configuration, is living amoung the old instances. All of them are listening and answering to requests on 0.0.0.0:80. This gives random error because the correct instance could be the one to answer the call. But as time goes by, more zombie instances are living, making the correct answer more and more improbable.

I will try to fix this issue, so far I had to kill and start new containers for it to work...

Is this project still alive by the way ? I see PR waiting and the last commit is 4 months old...

eesprit commented 8 years ago

It might be related to : https://github.com/haproxy/haproxy/commit/d50b4ac0d4acb00e0d2386198191ac329e8dbb77

The consul-template version included in this Docker image is quite old, it is compiled with go 1.5.3, newest releases are compiled with go 1.6.

I remember having seen the same behaviour as yours, and it was fixed (as far as I remember) by updating the consul-template version used and rebuilding the image.

Don't know if this project is still alive, as you said, there is not a lot activities around here... so I guess it is not.

eesprit commented 8 years ago

Have a look at #32, it should fix the issue.

auguster commented 8 years ago

Thanks @eesprit, I forked the project and will test various solutions. I also saw that some other repo proposed using pidof instead of cat. I also updated the template file to serve multiple domains through SERVICE_TAGS.

lalarsson commented 8 years ago

@auguster @eesprit A fix have been merged into master. See #37

auguster commented 8 years ago

Thanks ! Those that mean that the projet is back ? There are many things I added to my fork, should I PR them to this project ?

stevendborrelli commented 8 years ago

@auguster that would be great.

auguster commented 8 years ago

Ok, I will do that as soon as I get the time. Maybe you should clean the PR, for example you upgraded to template 0.14.0 already, PR #32 could be removed.

lalarsson commented 8 years ago

@auguster Sweet! Took a look at your fork and some of the stuff you've done were I aiming to do myself 👍 This project is going to be even more awesome with your improvements!