eBayClassifiedsGroup / PanteraS

PanteraS - PaaS - Platform as a Service in a box
GNU General Public License v2.0
199 stars 61 forks source link

HAProxy performance issues #236

Closed cookandy closed 7 years ago

cookandy commented 7 years ago

Hi,

I have a simple node application running in Marathon. I am using ApacheBench to do a simple load test against my app. When I go direct to the application via the marathon port, i.e. ab -n 6000 -c 300 -k "http://10.134.23.59:31700/" I see numbers like

Requests per second:    1053.19 [#/sec] (mean)
Requests per second:    1104.31 [#/sec] (mean)
Requests per second:    1069.48 [#/sec] (mean)

However, when I go through HAProxy, i.e. ab -H "Host: myapp.service.consul" -n 6000 -c 300 -k "http://10.134.23.59/" I see numbers like

Requests per second:    699.89 [#/sec] (mean)
Requests per second:    752.43 [#/sec] (mean)
Requests per second:    747.10 [#/sec] (mean)

Which is around 300 req/sec worse than when going direct to the application.

I was wondering if you notice any performance issues when routing requests through HAProxy as I do...

cookandy commented 7 years ago

I also tried bumping maxconn from 128 to 600, but it didn't make any difference...

sielaq commented 7 years ago

Did you tired with fabio instead ? it should works much better since there is no any additional iptables redirects in between. (we use iptables to switch HAproxy instances - so we can smoothly change HAproxy config when HAproxy is reloaded)

Try also this ApacheBench check directly to HAproxy on port 8660 or 8550 - omitting iptables balancer.

cookandy commented 7 years ago

I did try to bypass iptables, but it didn't make any difference. I don't think Fabio will work for me as I use TCP connections, and while I know Fabio supports TCP, I don't think there's an easy way to configure it in PanteraS. Haproxy should be able to handle hundreds of thousands of requests, so I'm not sure why it is slow. Do you notice the same with your setup?

sielaq commented 7 years ago

We see that there is a difference for one or two instances, but when we scale it up (on multiple different host) then its getting better. But might be you gonna be interested in totally different solution.

You can try to use https://github.com/kobolog/gorb which is also integrated with consul. And use kernel - the faster method ever to do balancing IPVS - you can't find faster.

cookandy commented 7 years ago

I tried to test with Fabio, but I'm running into some issues. I'm running on a machine with two network adapters - one public, and one private. My public IP is disabled. When I start Fabio with:

FABIO_APP_PARAMS=-proxy.addr :80 -ui.addr :81 -registry.consul.register.addr :81 -registry.consul.addr 10.134.20.234:8500

I see in the logs that it tries to bind to my pubic IP address (xx.xx.xx.xx):

2016/12/08 15:47:25 [INFO] consul: Registered fabio with address "xx.xx.xx.xx"
2016/12/08 15:47:25 [INFO] consul: Registered fabio with tags ""
2016/12/08 15:47:25 [INFO] consul: Registered fabio with health check to "http://[xx.xx.xx.xx]:81/health"

Fabio passes its health check in consul and I can curl to http://127.0.0.1:81.

curl 127.0.0.1:81
<a href="/routes">See Other</a>.

However, when I try to update Fabio to bind to my private IP address, using:

FABIO_APP_PARAMS=-proxy.addr 10.134.20.234:80 -ui.addr 10.134.20.234:81 -registry.consul.register.addr 10.134.20.234:81 -registry.consul.addr 10.134.20.234:8500

It seems to start, but I cannot connect:

curl 127.0.0.1:81
curl: (7) Failed to connect to 127.0.0.1 port 81: Connection refused

It seems to be listening:

netstat -an | grep 81
tcp        0      0 10.134.20.234:81        0.0.0.0:*               LISTEN

Any idea?

sielaq commented 7 years ago

why you use 127.0.0.1 instead of 10.134.20.234 ?

cookandy commented 7 years ago

ignore my last post - I still had some iptables rules in place from the haproxy_a and haproxy_b stuff...

testing Fabio now...

cookandy commented 7 years ago

It looks like Fabio is working much better than HAProxy - and is close to what I'd expect going direct to the application.

However, reading through the Fabio documentation it looks like in order to setup a TCP socket you must reconfigure -proxy.addr (and restart Fabio). I am not sure it is possible to automate this via Marathon in PanteraS.

Going back to HAProxy, it appears my connection requests are getting queued, which I think is causing the delay:

screen shot 2016-12-08 at 10 54 01 am

Are you seeing the same queueing on your side? I'd really like to try to use HAProxy, if possible. I've read of people putting like 300,000 req/sec through HAProxy, so I'm not sure why it's slowing down with 1000 req/sec.

sielaq commented 7 years ago

nope we never have that issue - this means that max limit has been reached ( thats why he is queuing ) you should have more instances of that AND/OR higher MAX sessions limit (but you already tested that).

sielaq commented 7 years ago

(try gorb - I wonder how it's gonna be faster compare to fabio)

cookandy commented 7 years ago

you should have more instances of that AND/OR higher MAX sessions limit

actually, when I increase the maxconn I don't see the queueing. however, the max req/sec is still only ~700 req/sec (compared to 1000 req/sec when going direct). I notice that the total time (with and without the queueing) is around 400 ms...

screen shot 2016-12-08 at 11 35 49 am

I'm hesitant to try gorb, as it will just complicate the deployment and configuration... but I'm also curious why haproxy is adding 30% overhead to such a small load...

sielaq commented 7 years ago

keep in mind that in very high traffic, you need to set higher then default conntrack, like

insmod nf_conntrack
sysctl net.netfilter.nf_conntrack_max=262144

^ this is an example - for permanent you need make it a bit different

and might be you need a higher maxsock(ulimit) and global max conn in HAproxy conf by default we set 16384 - not sure if would be enough for you.

global
    daemon
    maxconn 16384
cookandy commented 7 years ago

I made some changes to haproxy config which seems to work much better. Opened a PR if you wanna test it...

sielaq commented 7 years ago

thx for that! Since we are not focused on HAproxy anymore.