QubitProducts / bamboo

HAProxy auto configuration and auto service discovery for Mesos Marathon
Apache License 2.0
793 stars 214 forks source link

haproxy zombies in docker container #31

Closed sttts closed 10 years ago

sttts commented 10 years ago

Every reload cmd leaves the previous haproxy process as zombie. Because neither bamboo nor docker waits for them, they don't go away.

In a real unix system (if bamboo is installed as package) init is waiting for the exited processes and this doesn't happen.

j1n6 commented 10 years ago

i'm marking this as duplicate #30, we are working on a fix for this.

sttts commented 10 years ago

Though this has nothing to do with multiple concurrent reloads. It even happens with only a single one.

sttts commented 10 years ago

I used docker-enter (https://github.com/Pithikos/docker-enter) and called the command manually for testing.

pdpi commented 10 years ago

I suspect it might actually be the same issue as #30, it's just that you struck the root cause for it, and the multiple concurrent updates simply cause an acute case of what you're describing.

sttts commented 10 years ago

I think so. The haproxy processes are due to missing init in the container.

Though the problem in #30 might be connected because concurrent reloads might create race conditions with the /var/run/haproxy.pid file. One has to wait for all those PIDs in there to actually exit until another reload can be started. This might also create multiple processes, though no zombies, but real running haproxy. On the other hand, I guess the processes to the later situation will probably immediately die because they cannot bind to the port 80/443.

pdpi commented 10 years ago

Until we solve this issue, I'll accept it as a likely scenario that, if the way we call reload creates a zombie, calling reload n times creates n zombies. At any rate, this issue will confound any work on #30.

sttts commented 10 years ago

At 889 one of our mesos servers started with OOMs. So yes, they add up with every reload.

sttts commented 10 years ago

My test cluster is working without any zombie for 2 days now, with plenty of deployments in Marathon.