Open malterb opened 8 years ago
I can report exactly the same issues. When I run my bamboo docker container in an old build, everything works fine but since I updated my container 2 days ago this is happening to me too.
A little help would be awesome :)
I've seen this before - haroxy constantly reloading will often cause a logjam if they happen too frequently, although Bamboo attempts to debounce reloads.
Are you constantly redeploying apps? A haproxy reload should only happen when marathons tasks move around in Mesos, causing the config to change and requiring a reload.
I fixed #177 - which caused unnecessary restarts - for the 0.2.14 release, are you using an older version?
We typically deploy multiple times in the course of a day. This is part of CI/CD env and it happens for me with 0.2.14. Sorry about the long post..
Here is the snip in bamboo.json
"HAProxy": { "TemplatePath": "/var/bamboo/haproxy_template.cfg", "OutputPath": "/etc/haproxy/haproxy.cfg", "ReloadCommand": "/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf $(cat /var/run/haproxy.pid)", "ReloadValidationCommand": "/sbin/haproxy -c -f {{.}}" }
Before starting bamboo, here is the ps output looks like...
root 30508 0.0 0.0 46332 1724 ? Ss 17:55 0:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid haproxy 30509 0.0 0.0 52108 3608 ? S 17:55 0:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds haproxy 30510 7.5 0.0 52380 2028 ? Ss 17:55 13:03 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
30510
At the next refresh, or app deploy we notice the following..
Bamboo logs 2016/02/20 20:51:08 Starting update loop 2016/02/20 20:51:08 bamboo_startup => 2016-02-20T20:51:08Z 2016/02/20 20:51:08 Queuing an haproxy update. 2016/02/20 20:51:08 Exec cmd: /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf $(cat /var/run/haproxy.pid) [martini] listening on :8000 (development) 2016/02/20 20:51:08 HAProxy: Configuration updated
haproxy 30820 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30510
After the next refresh...
haproxy 30820 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30510 haproxy 30770 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30820
The next refresh just keeps adding to the list of haproxy processes
HAProxy processes are designed to live as long as there are still connections being served. Could it possibly be that you have some long-running connections still pending when the reload is initiated?
We're operating in a high-frequency deployment environment as well. For us, it's not uncommon to see 15-20 HAProxy processes being alive at the same time due to long-running WebSocket connections. They do rotate out after a few hours and get replaced by newer processes, however, which is an indication for progress. You might want to check on that behavior as well.
There shouldn't be any long-running connections to be honest (5s max). Our problem is: the stale haproxy processes still accept connections and cause 503 due to obviously now defunct instances.
It seems strange that HAProxy takes over so many PIDs. For us, it's only ever one PID that's passed to -sf
, and the PID file never contains more than one entry either.
I'd try to figure if the PID file is populated/cleaned up properly. Are you using HAProxy natively or inside Docker?
I always get 4 because of nbproc. The issue remains even when I use nbproc 1 and hence only one PID.
i found a similar problem others have reported with consul template / haproxy - https://github.com/hashicorp/consul-template/issues/442
Could this be similar [go] related issues?
Today we updated to 0.2.15 - and changed the reload command as follows:
"ReloadCommand": "/bin/systemctl reload haproxy"
so far, seems to be working a world better
EDIT: Even with grace 0s I'm having stale haproxy processes. _ I was having this issue (haproxy 1.6 inside docker), using "grace 0s" in the "defaults" section in the haproxy template conf, solves the issue.
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
grace 0s
# Template Customization
frontend http-in
bind *:80
{{ $services := .Services }}
Documentation: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#4.2-grace
this works for short lived connections.. if you have long running conns.. those will get killed, so perhaps you can increase the grace period... But the weird thing is.. Why haproxy keeps accepting new connections? _
Has anyone tried marathon-lb's reload command?
https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run
#!/bin/bash
exec 2>&1
export PIDFILE="/tmp/haproxy.pid"
exec 200<$0
reload() {
echo "Reloading haproxy"
if ! haproxy -c -f /marathon-lb/haproxy.cfg; then
echo "Invalid config"
return 1
fi
if ! flock 200; then
echo "Can't aquire lock, reload already in progress?"
return
fi
# Begin to drop SYN packets with firewall rules
IFS=',' read -ra ADDR <<< "$PORTS"
for i in "${ADDR[@]}"; do
iptables -w -I INPUT -p tcp --dport $i --syn -j DROP
done
# Wait to settle
sleep 0.1
# Save the current HAProxy state
socat /var/run/haproxy/socket - <<< "show servers state" > /var/state/haproxy/global
# Trigger reload
haproxy -p $PIDFILE -f /marathon-lb/haproxy.cfg -D -sf $(cat $PIDFILE)
# Remove the firewall rules
IFS=',' read -ra ADDR <<< "$PORTS"
for i in "${ADDR[@]}"; do
iptables -w -D INPUT -p tcp --dport $i --syn -j DROP
done
# Need to wait 1s to prevent TCP SYN exponential backoff
sleep 1
flock -u 200
}
mkdir -p /var/state/haproxy
mkdir -p /var/run/haproxy
reload
trap reload SIGHUP
while true; do sleep 0.5; done
Found https://github.com/hashicorp/consul-template/issues/442 and https://github.com/golang/go/issues/13164
Could actually be related to Go. I just compiled bamboo with go 1.6 and will update this accordingly.
BTW: Another reload script that I might try if doesn't work: https://github.com/eBayClassifiedsGroup/PanteraS/blob/master/infrastructure/haproxy_reload.sh
@elmalto: were you able to resolve the issue with the latest Go 1.6 or did you employ a reload script?
-Imran
I have not seen this issue since upgrading to 1.6
Malte On Tue, Apr 19, 2016 at 08:16 imrangit notifications@github.com wrote:
@elmalto https://github.com/elmalto: were you able to resolve the issue with the latest Go 1.6 or did you employ a reload script?
-Imran
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/QubitProducts/bamboo/issues/200#issuecomment-211974236
Hey,
As stated in #206 , we had this issue before on our Mesos cluster. After migrating our docker images to Go 1.6 about 2 days ago, looks like it fixed it.
I can't confirm this since we also have a lot of long running connections, but the number of haproxy processes after 2 days seems much more reasonnable than before. I'll have another look during the next week and post again if something changes.
Thanks for having found that out anyway :).
Upgrade might have helped. I have a hunch that it's likely to be HAProxy itself.
Do you have any information/data about how often your deployment triggers reload?
On 13 May 2016, at 04:41, Mikaël Vallerie notifications@github.com wrote:
Hey,
As stated in #206 , we had this issue before on our Mesos cluster. After migrating our docker images to Go 1.6 about 2 days ago, looks like it fixed it.
I can't confirm this since we also have a lot of long running connections, but the number of haproxy processes after 2 days seems much more reasonnable than before. I'll have another look during the next week and post again if something changes.
Thanks for having found that out anyway :).
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
Hey, sure !
Usually on this cluster we get around 0 to 5 updates a day. The day it failed, we had many more (probably around 10), which resulted in something like 15+ haproxy processes on some mesos slaves.
We have one bamboo running (as a docker container) on each mesos slave. Right now, they have been up for 1 week, we had some updates on last friday but the amount of haproxy processes increased only until 6. And more important, sometimes, this amount is getting down, which wasn't the case before the upgrade.
I have a hunch that it's likely to be HAProxy itself.
My guess is you are right. We used marathon-lb before bamboo, and we also had this issue with it.
I suggest moving to Nginx to replace Haproxy, there's a branch that @bluepeppers has been working on that would support multiple reload destination - but it's still WIP.
Does nginx support TCP balancing (as Haproxy does) out of its "Plus" version ? Looks unclear to me.
I know it may work after building nginx with some extra modules. I'm just unsure about what those "extra modules" may or may not support compared to haproxy.
Yup, it does support it. If you are using Nginx Plus version, both TCP and UDP are supported out of the box.
If you are using open sourced, try to use this Nginx compatible fork: https://github.com/alibaba/tengine
Sent from my iPhone
On 17 May 2016, at 00:26, Mikaël Vallerie notifications@github.com wrote:
Does nginx support TCP balancing (as Haproxy does) out of its "Plus" version ? Looks unclear to me.
I know it may work after building nginx with some extra modules. I'm just unsure about it.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
@activars Just to let you know (we're still on haproxy), it happened again today on one of our mesos slaves. This issue now seems to happen only in very rare / specific situations (this is the only time it happened since), and is probably more related to haproxy or go, so it's probably not necessary to reopen.
According to the refs above, upgrading haproxy to the latest 1.5.x might be the way to definitely fix that out. Considering minor version upgrades shouldn't harm, I prepared a docker image including haproxy 1.5.19 (vs 1.5.8) and based on golang 1.8 (vs 1.6, well, that's not minor but let's trust the promise of compatibility, and let me know if that sounds like a terrible mistake :).
I'm going to test this during the next few days.
I still have the problem. Running haproxy:1.7.5
I'm very sorry for the lack of comms on this thread. We no longer run bamboo (no longer on mesos), and are not going to be able to provide ongoing maintenance. If anyone is interesting in maintaining it going forward, please raise another issue and we'll look at redirecting people to a fork.
Hi,
I am running into the issue that I constantly get stale haproxy processes. I have tried "everything", but can't get it to work. This is my bamboo.log for an occasion, where it happened:
as you can see from my processes, there are two sets of haproxies:
and I use the following config for bamboo:
and the config parts of my haproxy.cfg:
Sorry for the long post. Would've gone to SO or SF, but thought this might be an issue with bamboo.
Can anyone point me in the right direction?