docker-archive / dockercloud-haproxy

HAproxy image that autoreconfigures itself when used in Docker Cloud
https://cloud.docker.com/
651 stars 187 forks source link

haproxy stuck in restart loop #200

Closed cmberard closed 7 years ago

cmberard commented 7 years ago

So I've scoured the interwebs, but cannot seem to find a resolution to this issue. Haproxy is intermittently getting stuck in a restart loop. I've tried version 1.6.2 and 1.6.5, but getting the same result. Only way I've found to resolve the issue is to restart the loadbalancer service, and then everything is fine for an hour or so. Any help/suggestions would be greatly appreciated!

Haproxy version: 1.6.2 & 1.6.5

Console Output:

[lb-1]2017-05-16T21:49:46.247250059Z INFO:haproxy:=> Executing task: haproxy 15 died , restart
[lb-1]2017-05-16T21:49:46.247282860Z INFO:haproxy:==========BEGIN==========
[lb-1]2017-05-16T21:49:50.246575792Z INFO:haproxy:=> Add task: haproxy 15 died , restart
[lb-1]2017-05-16T21:49:55.247779119Z INFO:haproxy:=> Add task: haproxy 15 died , restart
[lb-1]2017-05-16T21:50:00.249218454Z INFO:haproxy:=> Add task: haproxy 15 died , restart
[lb-1]2017-05-16T21:50:05.250185375Z INFO:haproxy:=> Add task: haproxy 15 died , restart
[lb-1]2017-05-16T21:50:10.250702184Z INFO:haproxy:=> Add task: haproxy 15 died , restart

Relevant Stackfile:

lb:
  image: 'dockercloud/haproxy:1.6.5'
  links:
    - service-1
    - service-2
    - service-3
  ports:
    - '80:80'
    - '443:443'
  restart: on-failure
  roles:
    - global
tifayuki commented 7 years ago

@cmberard Is this reproducible constantly, or only to some specific environment?

cmberard commented 7 years ago

@tifayuki I'm only seeing it on docker-cloud (with an Azure VM). It's happened on several different stacks, however.

tifayuki commented 7 years ago

how to reproduce it? I mean do your run it on swarm mode/cloud standard mode? what is the whole stack file, so that I can try to reproduce it on my end.

cmberard commented 7 years ago

@tifayuki sorry, cloud standard mode. I will have to dummy up a stack file since it is a few private nodejs microservices.

Any suggestions on mitigating in the interim? I noticed that it is trying to restart every 5 seconds, is that controlled by the connect timeout, or something different?

Thanks again for the help.

tifayuki commented 7 years ago

ya, a dummy stack file will work. And do you try the latest version 1.6.6. There are some fixes related to the issues happened on azure.

cmberard commented 7 years ago

ok, bumped it to 1.6.6 for now. Will monitor for now and shoot over a stack file later. Thanks!

cmberard commented 7 years ago

So I've gone a couple of days and pushed to this stack a few times and things have been great. Not sure what the root issue was, but 1.6.6 seems to have resolved it, regardless.

Thanks again for your help, closing this one out.