Open Tronic opened 4 years ago
Configuring these benchmarks and running them all was a big time sink and I think I'm done with it at the moment so I'm not going to reopen it right now. Given that I'm running both nginx and the app server on the same host my initial suspicion is that TCP connection opening is not a big factor in the results though I'm not certain about that. Please do let me know if you do anything similar though :)
Can you reference any numbers re: Sanic's own webserver? Sanic obviously makes a pretty core performance claim but I'm not aware of any benchmarks that they publish to go with that claim. TechEmpower's benchmarks (which I think are ok for async frameworks) don't find sanic faster than uvicorn or starlette and it looks to me like they are using sanics built-in web server that you've mentioned.
Quick benchmarks with a "Hello World" Sanic webapp:
$ wrk -c100 -t8 http://127.0.0.1/ # Nginx no keep-alive
Running 10s test @ http://127.0.0.1/
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 28.68ms 91.12ms 663.62ms 95.46%
Req/Sec 1.08k 362.73 1.34k 87.67%
16278 requests in 10.02s, 3.10MB read
Requests/sec: 1623.91
Transfer/sec: 317.12KB
$ nginx -s reload # Enable keep-alive
$ wrk -c100 -t8 http://127.0.0.1/
Running 10s test @ http://127.0.0.1/
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.17ms 9.15ms 171.04ms 93.31%
Req/Sec 3.67k 666.07 12.99k 89.40%
292653 requests in 10.10s, 55.81MB read
Requests/sec: 28977.42
Transfer/sec: 5.53MB
$ wrk -c100 -t8 http://127.0.0.1:8000/ # No Nginx proxy (direct connection to Sanic)
Running 10s test @ http://127.0.0.1:8000/
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.13ms 1.87ms 43.35ms 91.93%
Req/Sec 6.27k 838.30 9.03k 72.95%
502406 requests in 10.10s, 62.77MB read
Requests/sec: 49744.30
Transfer/sec: 6.21MB
As you can see, not using keep-alive completely devastates the performance. Uvicorn vs. built-in has much less dramatic effect.
Ok here are some off the cuff numbers with my benchmark which as you know does a database query, uses a pooler, etc. As I'm sure you'll know, the trouble with hello world apps (especially over a continuous TCP connection) is that they are really not doing anything and obviously are completely unrepresentative of any real world app. Using a database, I think, at least puts me into the same ballpark, even if most apps will still be and order of magnitude slower in real life.
I haven't taken the trouble to set up all VMs again, this is all just done on my machine so they're not as rigorous/accurate as my full results but I have found local runs to be roughly in-line with the final results. Certainly not completely different.
Requests/second and P99
framework | http 1.0 | http 1.1 |
---|---|---|
uvicorn+starlette | 5084/93 | 4786/122 |
uvicorn+sanic | 4476/114 | 4101/166 |
sanic w/own server | 5783/61 | 5744/70 |
uswgi+falcon | 7013/22 | 7052/22 |
Looks like Keep-Alive does not help. More surprisingly it hurts uvicorn a bit. My supposition is that either a) there is something funny about the interplay here between nginx and uvicorn or b) perhaps Keep-Alive adds some overhead in CPU in uvicorn that I'm not noticing the network benefit of because it's all on one host (that much is true of the real setup too - the nginx->webserver part is on the same host).
Use sanic's own server definitely helps, both with latency and throughput and so looks like it might be the best of the async servers. Does it support ASGI? My guess is starlette running through it would perform even better.
Ultimately though it still looks to me like the latency variance problem is pretty much still there and of course UWSGI is much better on throughput and latency.
Sure, a database-heavy benchmark would have less effect by keep-alive. Still, are you sure that you did enable keep-alive?
proxy_http_version 1.1;
is the obvious bitkeep_alive
directive in a separate upstream
section (cannot use IP/hostname in proxy_pass directive)proxy_set_header connection "";
to remove the "connection: close" headerThe last one is especially easy to miss. Why Nginx made this config so complicated is beyond me.
Sanic's built-in server directly builds Sanic's own data structures, avoiding the overhead of ASGI conversions. I believe that Sanic 20.3 is quite close to uvicorn with differences that would matter only in Hello World -style requests, while the streaming
branch has significantly better performance with its pure Python HTTP parser (but since it hasn't been released yet, I don't expect you to be benchmarking it).
There might also be some differences across operating systems. I did that benchmark on a 2015 Macbook Pro, and Linux might incur different penalties for newly opened localhost TCP connections.
The proxy_pass config uses HTTP/1.0 without keep-alive, which wastes a lot of time spawning new TCP connections to the backend server. Consult e.g. https://sanic.readthedocs.io/en/latest/sanic/nginx.html for how to make it fast.
P.S. Sanic's built-in web server (the one used in the above docs) is much faster than uvicorn-ASGI-sanic, which is what you are currently benchmarking.
EDIT: use the technical term
keep-alive
instead of pipelining to avoid confusion.