Closed rianjs closed 6 years ago
@sebastienros
Yes it’s fully automated, see https://github.com/aspnet/benchmarks/blob/dev/README.md for more details.
@davidfowl I have huge performance hit (3x times) after switching to core 2.1 and especially managed sockets. Test was performed on the same EC2 machines, I've just replaced docker images (code was the same only .net core and tpl nugets were updated). CPU on the target machine were jumping to 100% (looks like GC)
asp.net core 2.0 was based on image microsoft/aspnetcore-build:2.0.5-2.1.4 asp.net core 2.1 build by image microsoft/dotnet:2.1-sdk and run on microsoft/dotnet:2.1.0-aspnetcore-runtime
Test results: asp.net core 2.0 run 1
ubuntu@ip-10-15-10-229:~$ wrk -t 2 -c 600 -d 1m -s ./firehose.lua --latency http://10.15.10.77
Running 1m test @ http://10.15.10.77
2 threads and 600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 180.00ms 49.99ms 515.80ms 83.41%
Req/Sec 1.81k 0.97k 3.73k 53.00%
Latency Distribution
50% 170.40ms
75% 187.14ms
90% 242.98ms
99% 407.57ms
200731 requests in 1.00m, 18.38MB read
Requests/sec: 3344.61
Transfer/sec: 313.56KB
asp.net core 2.0 run 2
ubuntu@ip-10-15-10-229:~$ wrk -t 2 -c 600 -d 1m -s ./firehose.lua --latency http://10.15.10.77
Running 1m test @ http://10.15.10.77
2 threads and 600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 173.56ms 47.42ms 746.84ms 84.59%
Req/Sec 1.85k 0.98k 3.98k 50.99%
Latency Distribution
50% 168.35ms
75% 185.38ms
90% 212.65ms
99% 360.20ms
206067 requests in 1.00m, 18.87MB read
Requests/sec: 3428.89
Transfer/sec: 321.46KB
asp.net core 2.1 run 1:
ubuntu@ip-10-15-10-229:~$ wrk -t 2 -c 600 -d 1m -s ./firehose.lua --latency http://10.15.10.77
Running 1m test @ http://10.15.10.77
2 threads and 600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 172.35ms 72.90ms 929.68ms 82.63%
Req/Sec 1.77k 0.94k 4.01k 56.81%
Latency Distribution
50% 157.70ms
75% 193.70ms
90% 248.64ms
99% 454.83ms
81698 requests in 1.00m, 7.48MB read
Socket errors: connect 0, read 0, write 0, timeout 600
Requests/sec: 1361.06
Transfer/sec: 127.60KB
asp.net core 2.1 run 2
ubuntu@ip-10-15-10-229:~$ wrk -t 2 -c 600 -d 1m -s ./firehose.lua --latency http://10.15.10.77
Running 1m test @ http://10.15.10.77
2 threads and 600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 161.83ms 67.24ms 952.09ms 87.61%
Req/Sec 1.84k 1.01k 4.34k 51.37%
Latency Distribution
50% 155.43ms
75% 182.37ms
90% 212.81ms
99% 342.32ms
94846 requests in 1.00m, 8.68MB read
Socket errors: connect 0, read 0, write 0, timeout 4188
Requests/sec: 1579.94
Transfer/sec: 148.12KB
asp.net core 2.1 Libuv run 1
ubuntu@ip-10-15-10-229:~$ wrk -t 2 -c 600 -d 1m -s ./firehose.lua --latency http://10.15.10.77
Running 1m test @ http://10.15.10.77
2 threads and 600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 193.50ms 191.16ms 1.98s 96.88%
Req/Sec 1.96k 0.98k 3.78k 47.58%
Latency Distribution
50% 161.09ms
75% 177.99ms
90% 212.65ms
99% 1.46s
211659 requests in 1.00m, 19.38MB read
Requests/sec: 3525.03
Transfer/sec: 330.47KB
asp.net core 2.1 Libuv run 2
ubuntu@ip-10-15-10-229:~$ wrk -t 2 -c 600 -d 1m -s ./firehose.lua --latency http://10.15.10.77
Running 1m test @ http://10.15.10.77
2 threads and 600 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 194.79ms 191.99ms 1.96s 96.01%
Req/Sec 1.95k 0.98k 4.41k 49.72%
Latency Distribution
50% 161.56ms
75% 181.06ms
90% 223.84ms
99% 1.42s
209424 requests in 1.00m, 19.17MB read
Requests/sec: 3484.71
Transfer/sec: 326.69KB
For those on this thread, the difference in results seems to come from timeouts on Sockets.
Is there a way we could be able to repro your numbers, by sharing the source code. Also what is the goal of forehose.lua
?
I would suggest that for now you use Libuv until we understand what the issue is with Sockets in your scenario.
@sebastienros firehose.lua it's lua script with data which chosen by random. How can I share my code with microsoft team privately?
Ideally a public repository with a simple repro, something that doesn't require a lot of setup, and that has a minimum of code so we can identify the reason more easily. Otherwise Skype it to me, sebastienros
.
As an update, I got the source code that shows the issue.
@sebastienros any indication as to the scenario causing this? we're seeing similar performance degredation by using sockets server, although even successfully completed requests are returning slower. When switching to libuv, performance restores. This is on 2.1.1.
I'm currently debugging a similar issue.
Upgraded my application from Kestrel 2.0.0 to 2.1.1 and have noticed a large increase in CPU usage on my webservers (at least 2x).
I'm currently in the process of specifying UseLibuv()
for Kestrel 2.1.1 to see if it alleviates my problem.
@akatz0813 @MatthewLymer I have been able to reproduce this kind of degradation when the application is itself using the Thread Pool intensively, for instance with the TPL libraries (more specifically the DataFlow https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library). Switching to Libuv solved the issue. This is probably due to the fact that these libraries would starve the thread pool and impact Kestrel Sockets transport.
Are you doing anything like this in your applications?
@sebastienros yes, we have a lot of async
await
keywords in our app.
Orange: Kestrel 2.1.1 w/ sockets Blue: Kestrel 2.1.1 w/ libuv
The impact was actually much more than I thought.
OS: Ubuntu 16.0.4 AWS Instance: c5.xlarge (4 core/8gb) Runtime: Microsoft.NETCore.App 2.1.2
I am not talking about async/await, but the TPL. If you are not using the TPL, can you share your application with me? Or part of it that reproduces the issue?
Oh, then in that case I'm not using the TPL.
Would it be possible to give kestrel its own Thread Pool to avoid contention with the rest of the application?
As for a reproduction, it would be tough to nail it down as the application is fairly large. Though 99% of the app is just a bunch of HttpClient
, Socket
, and SslStream
async calls.
Also, a lot of this stuff I'm talking about we do in parallel w/ Task.WhenAll
, some incoming http requests result in the application processing up to 100 network calls in parallel.
You can detect threadpool starvation by adding Ben.BlockingDetector and outputting warning messages to the log; it should hopefully highlight the sources.
Thank you @benaadams, this looks like a very useful tool and I'll check it out when I have time.
The rest of the application performs within our specifications when using Libuv
, and even when transition from CurlHttpHandler
to SocketsHttpHandler
in dotnet-core-2.1.
As for the CPU usage itself, why would CPU usage increase if the Kestrel server is waiting for a task to start?
As for the CPU usage itself, why would CPU usage increase if the Kestrel server is waiting for a task to start?
Are you creating Task.Delay
s faster than their timeouts and not cancelling them? (So end up with a large build up of waiting timers) or creating HttpClient
s and not disposing of them? (not using HttpClientFactory or a static HttpClient)
I don't have any Task.Delay
s in my codebase. I am creating HttpClient
s, but they're instanciated in classes that are registered in the DI framework as singleton scope (and I dispose of them afterwards).
The only thing I can think of is I have a single CPU bound (synchronous) operation per request created with Task.Run(() => ...)
that is running that I eventually wait on with a await Task.WhenAll(...)
. I use the WhenAll
to wait for both the CPU bound and IO bound operations to complete.
@sebastienros we are not using TPL either.
@sebastienros any further thoughts on this?
@akatz0813 @MatthewLymer your problems are not related to this issue, which might be confusing. Would you mind filing a new issue individually so I can track the cases independently? I assume this current issue could then be closed.
As for making any progress, we would really need an application that reproduces this behavior. The difference in perf is quite big, which is more that what we see in our own perf tests comparing Libuv and Sockets even in the worst case, and we test on many types of apps (mvc, db, razor, apis). Only the pure json serialization ones so far show a better perf for Libuv, and not that much. We obviously don't have enough coverage, and being able to repro your issues locally would be really helpful, at least to be able to understand where the difference is coming from, and then give guidance for other customers who face the same problem.
Would you be able to provide me with an application, as simple as possible, that would show this behavior? Or at least the profile of the logic that is applied on these specifically slow APIs? I already got some explanation from @MatthewLymer but the more details, the easier it will be for me to repro and investigate.
If you want to take it privately to send me some code, please email me at first.last@microsoft.com.
@sebastienros will work on simplifying our application to the point we can share it. Thank you!
@sebastienros I'll look at creating a simple repro for you.
I am closing this issue because the original question has been answered.
However @MatthewLymer @hmvs @akatz0813 I am filing a more appropriate issue for your cases.
I asked a question on StackOverflow this morning about implementing a CI build that would fail if performance regressed too much. It occurred to me that Kestrel might have something like this using Benchmark.NET.
There's obviously a bunch of performance tests in Kestrel. But is the process for making sure egregious perf regressions don't occur ad hoc or is it systematized? If it's systematized, how did you do it?