QubitProducts / bamboo

HAProxy auto configuration and auto service discovery for Mesos Marathon
Apache License 2.0
794 stars 214 forks source link

Fix reload commands. #122

Closed timoreimann closed 9 years ago

timoreimann commented 9 years ago

The following problems were fixed with the given reload commands:

IMPORTANT: This needs some thorough testing. While a ran several tests via a containerized instance of HAProxy, it is hard to catch all cases without operating in a real environment. I am going to do that starting tomorrow and see how it works out.

/cc @sttts for additional review as he did one of the referenced commits.

refs #79

j1n6 commented 9 years ago

Thanks. That's very thoughtful finding. I'm working on some haproxy reload benchmarking during working hours, I test your script as part of the process. What I want to achieve is:

timoreimann commented 9 years ago

Sounds good. A few remarks on my end:

sttts commented 9 years ago

@timoreimann If you are right with your 3rd bullet point that haproxy waits long enough before forking and quitting, I cannot see anything wrong with your patch. Though I introduced the wait for the old pids to quit because I had seen different behaviour of haproxy. I also tested your version at that time, but it wasn't reliable. Has haproxy changed since then in respect to reloading?

Next to looking at the haproxy code, it shouldn't be hard to build a small bash script reloading haproxy continuously. I had that running for an hour or two with thousands of reloads to make sure it works.

sttts commented 9 years ago

Looking at the two lines in haproxy.c I get the same unterstanding, namely that reload is supposed to wait until the pid file is written. I wonder which version I used for my tests at that time. Could have been an early 1.5 release (looking at the ppa repository I use). But even then according to git blame the reload behavior should have been the same, mostly from late 2013.

timoreimann commented 9 years ago

@sttts When you tested my version at the time, did you already have the semaphore implemented in Bamboo? I think that's a necessary part in making sure that no race condition occurs.

The latest documented change I could find with regards to hot reconfiguration dates back to 2008 (changelog, commit). Doesn't seem related to me.

At least as far as the lock-ups described in #79 are concerned, they seem to be related to long-running connections (web-sockets in particular). This is a bit more tricky to test; as mentioned, I am going to set the reload command to the proposed version in our staging environment tomorrow and see how it behaves. We usually run into the problem fairly quickly.

j1n6 commented 9 years ago

I tested the reload command and reload every second. It look good to me. however, you probably notice there will be connection drop during the reload. I also tested the drop sync solution, e.g.

  iptables -I INPUT -p tcp --dport 80 --syn -j DROP
  sleep 0.2
  /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
  iptables -D INPUT -p tcp --dport 80 --syn -j DROP
  sleep 1

This seems to elimiate the connection drop. Here's my ApacheBench test result (i'm reloading every second):

perf-data

Do you think it's a good idea to build DROP SYN into the reload?

timoreimann commented 9 years ago

I don't think we should build SYN dropping into the reload command, primarily for the reason that it can cause delays of up to 3 seconds on the client end according to the Yelp blog post. That may be unacceptable to certain Bamboo users (I know it'll be for us in the long term), and they may prefer to look for alternatives to solve this problem or just be fine with having dropped connections for whatever reason.

Apart from that, I am not certain we can assume iptables always exists on the target machine. We'd also need to dynamically determine the port, or ignore it all together.

From my perspective, it's best to document the issue in the README file and add the SYN dropping approach as one possible solution. Bamboo users that like it can customize the restart command accordingly.

j1n6 commented 9 years ago

Thanks for your quick response. I agree with your feedback.

I might consider allowing to use template variables in the reload command. This gives a little flexibility to customize the command.

I'm happy to merge this PR.