calpaterson / python-web-perf

Code for testing performance of popular python webservers
106 stars 21 forks source link

Nginx not setup to use HTTP/1.1 and pipelining #3

Open Tronic opened 4 years ago

Tronic commented 4 years ago

The proxy_pass config uses HTTP/1.0 without keep-alive, which wastes a lot of time spawning new TCP connections to the backend server. Consult e.g. https://sanic.readthedocs.io/en/latest/sanic/nginx.html for how to make it fast.

P.S. Sanic's built-in web server (the one used in the above docs) is much faster than uvicorn-ASGI-sanic, which is what you are currently benchmarking.

EDIT: use the technical term keep-alive instead of pipelining to avoid confusion.

calpaterson commented 4 years ago

Configuring these benchmarks and running them all was a big time sink and I think I'm done with it at the moment so I'm not going to reopen it right now. Given that I'm running both nginx and the app server on the same host my initial suspicion is that TCP connection opening is not a big factor in the results though I'm not certain about that. Please do let me know if you do anything similar though :)

Can you reference any numbers re: Sanic's own webserver? Sanic obviously makes a pretty core performance claim but I'm not aware of any benchmarks that they publish to go with that claim. TechEmpower's benchmarks (which I think are ok for async frameworks) don't find sanic faster than uvicorn or starlette and it looks to me like they are using sanics built-in web server that you've mentioned.

Tronic commented 4 years ago

Quick benchmarks with a "Hello World" Sanic webapp:

$ wrk -c100 -t8 http://127.0.0.1/   # Nginx no keep-alive
Running 10s test @ http://127.0.0.1/
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    28.68ms   91.12ms 663.62ms   95.46%
    Req/Sec     1.08k   362.73     1.34k    87.67%
  16278 requests in 10.02s, 3.10MB read
Requests/sec:   1623.91
Transfer/sec:    317.12KB
$ nginx -s reload   # Enable keep-alive
$ wrk -c100 -t8 http://127.0.0.1/
Running 10s test @ http://127.0.0.1/
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.17ms    9.15ms 171.04ms   93.31%
    Req/Sec     3.67k   666.07    12.99k    89.40%
  292653 requests in 10.10s, 55.81MB read
Requests/sec:  28977.42
Transfer/sec:      5.53MB
$ wrk -c100 -t8 http://127.0.0.1:8000/  # No Nginx proxy (direct connection to Sanic)
Running 10s test @ http://127.0.0.1:8000/
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.13ms    1.87ms  43.35ms   91.93%
    Req/Sec     6.27k   838.30     9.03k    72.95%
  502406 requests in 10.10s, 62.77MB read
Requests/sec:  49744.30
Transfer/sec:      6.21MB

As you can see, not using keep-alive completely devastates the performance. Uvicorn vs. built-in has much less dramatic effect.

calpaterson commented 4 years ago

Ok here are some off the cuff numbers with my benchmark which as you know does a database query, uses a pooler, etc. As I'm sure you'll know, the trouble with hello world apps (especially over a continuous TCP connection) is that they are really not doing anything and obviously are completely unrepresentative of any real world app. Using a database, I think, at least puts me into the same ballpark, even if most apps will still be and order of magnitude slower in real life.

I haven't taken the trouble to set up all VMs again, this is all just done on my machine so they're not as rigorous/accurate as my full results but I have found local runs to be roughly in-line with the final results. Certainly not completely different.

Requests/second and P99

framework http 1.0 http 1.1
uvicorn+starlette 5084/93 4786/122
uvicorn+sanic 4476/114 4101/166
sanic w/own server 5783/61 5744/70
uswgi+falcon 7013/22 7052/22

Looks like Keep-Alive does not help. More surprisingly it hurts uvicorn a bit. My supposition is that either a) there is something funny about the interplay here between nginx and uvicorn or b) perhaps Keep-Alive adds some overhead in CPU in uvicorn that I'm not noticing the network benefit of because it's all on one host (that much is true of the real setup too - the nginx->webserver part is on the same host).

Use sanic's own server definitely helps, both with latency and throughput and so looks like it might be the best of the async servers. Does it support ASGI? My guess is starlette running through it would perform even better.

Ultimately though it still looks to me like the latency variance problem is pretty much still there and of course UWSGI is much better on throughput and latency.

Tronic commented 4 years ago

Sure, a database-heavy benchmark would have less effect by keep-alive. Still, are you sure that you did enable keep-alive?

The last one is especially easy to miss. Why Nginx made this config so complicated is beyond me.

Tronic commented 4 years ago

Sanic's built-in server directly builds Sanic's own data structures, avoiding the overhead of ASGI conversions. I believe that Sanic 20.3 is quite close to uvicorn with differences that would matter only in Hello World -style requests, while the streaming branch has significantly better performance with its pure Python HTTP parser (but since it hasn't been released yet, I don't expect you to be benchmarking it).

There might also be some differences across operating systems. I did that benchmark on a 2015 Macbook Pro, and Linux might incur different penalties for newly opened localhost TCP connections.