dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Why does my .NET 6.0 application hit Thread Starvation easier than the .NET Framework ? #75104

Open hungdoan-groove opened 2 years ago

hungdoan-groove commented 2 years ago

Description

I ported my Service Fabric Application from .NET Framework 4.6.1 to .NET 6.0 It seems, after the migration:

My questions are:

Configuration

My system architecture: Users ----(1)HTTP----> Service A: Web API ----(2)Service Fabric remoting----> Service B: Remoting Service.

The logic of my test cases is mostly:

Azure Service Fabric

Load tested: via K6

Regression?

I did a load test and found that: 1/ In a single thread (1 concurrent user) .NET 6.0 is faster. 2/ In the basic case (say 100 concurrent users) .NET 6.0 version has the same or better Throughput than the .NET Framework. However,

3/ In some complex test cases .NET 6.0 has less Throughput than the .NET Framework. It's about 20-50% less, and the CPU is just around 70%.

Data

N/A

Analysis

1/ It seems there is a Thread Starvation in .NET 6.0 2/ Does that mean It's easier to hit Thread Starvation than the .NET Framework? 3/ Is there any change/optimization in .NET 6.0 compare to .NET Framework?. So that It's fast in the usual load, but at some load, the .NET 6.0 will easier to hit the Thread Starvation issues compare to the .NET Framework

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

Issue Details
### Description I ported my Service Fabric Application from .NET Framework 4.6.1 to .NET 6.0 It seems, after the migration: - .NET 6.0 version will make it easier to hit Thread Starvation. - In some cases + heavy load, the .NET 6.0 **has less throughput** than the .NET Framework version. Is this the side effect of Thread Starvation? My questions are: - Is the reduction of throughput the result of the Thread Starvation ?. - Is there any change/optimization in .NET 6.0 compare to .NET Framework?. So that It's fast in some load, but at some (heavy) load, the .NET 6.0 will easier to hit the Thread Starvation issues compare to the .NET Framework? ### Configuration My system architecture: Users ----(1)HTTP----> Service A: Web API ----(2)Service Fabric remoting----> Service B: Remoting Service. The logic of my test cases is mostly: - CPU-sensitive task. - It has some I/O logic to save logs to Azure Storage Azure Service Fabric - OS: Windows Server - Load balancing in a cluster group of 05 VMs of Standard_E16ds_v4 - Service A - Web API: Use Kestrel Load tested: via K6 ### Regression? I did a load test and found that: 1/ In a single thread (1 concurrent user) .NET 6.0 is faster. 2/ In the basic case (say 100 concurrent users) .NET 6.0 version has the same or better Throughput than the .NET Framework. However, - .NET 6.0 version likely hit Thread Starvation issues: increased the concurrent users, Throughput decreased, and CPU never goes above 70% - Within the .NET Framework version, the CPU will likely hit 100% at the 100 concurrent users. But in .NET 6.0 version, 100, 200 or 300 users the CPU is just around 70% max. 3/ In some complex test cases .NET 6.0 has less Throughput than the .NET Framework. It's about 20-50% less, and the CPU is just around 70%. ### Data N/A ### Analysis 1/ It seems there is a Thread Starvation in .NET 6.0 2/ Does that mean It's easier to hit Thread Starvation than the .NET Framework? 3/ Is there any change/optimization in .NET 6.0 compare to .NET Framework?. So that It's fast in the usual load, but at some load, the .NET 6.0 will easier to hit the Thread Starvation issues compare to the .NET Framework
Author: hungdoan-groove
Assignees: -
Labels: `area-System.Threading`, `tenet-performance`, `untriaged`
Milestone: -
danmoseley commented 2 years ago

Are you creating (pooling?) threads yourself, queuing work to the thread pool manually, using Tasks, ..?

hungdoan-groove commented 2 years ago

@danmoseley yes there is a Task.Run() to queue a thread manually. I also agree that creating extra threads would lead to Thread Starvation earlier. But both the .NET Framework and .NET 6.0 versions have that same logic.

Why would .NET 6.0 get Thread Starvation, but .NET Framework?

mangod9 commented 2 years ago

@kouvel to provide some guidance.

kouvel commented 2 years ago

Is the reduction of throughput the result of the Thread Starvation ?.

Starvation in the thread pool may result in lower throughput, but lower throughput doesn't imply that there is starvation.

Is there any change/optimization in .NET 6.0 compare to .NET Framework?. So that It's fast in some load, but at some (heavy) load, the .NET 6.0 will easier to hit the Thread Starvation issues than the .NET Framework?

I'm not aware of any changes that would cause starvation in the thread pool to be more likely in .NET 6 compared with .NET Framework.

For the app running on .NET Framework, does it configure the runtime in some way, either through environment variables, *.exe.config file, or through the registry? For example, enabling server GC, or configuring the thread pool, etc.?

.NET 6 may need to be configured a bit differently, so make sure those configuration settings are also migrated equivalently.

It sounds like it's relatively easy to reproduce the issue under load testing. I would suggest collecting two perf profiles using PerfView, one for the app running on .NET Framework, and one for the app running on .NET 6. So that they can be compared, have the apps under the same amount of load where .NET 6 has lower throughput. A profile using default settings should be fine to start with. Profiling can slow down the app, so check whether the throughput is still lower in .NET 6 while profiling. A comparison of the profiles may tell more about what's happening, the link above to the PerfView overview has some useful links to information on analyzing profiles.

To look for thread pool starvation, in the Events views look at the ThreadPoolWorkerThreadAdjustment/Adjustment with Reason=Starvation, and IOThreadCreate events with Count showing a thread count higher than the processor count.

davidfowl commented 2 years ago

There are reasons like the http.sys request queue helped throttle incoming work so the app could recover. What does the app do? Are you calling blocking APIs? It’s possible you’re now doing sync over async instead of just sync IO. That would make starvation happen more easily.

hungdoan-groove commented 2 years ago

@kouvel thanks for the useful information, I will check it out, especially the server GC as I think I'm using this mode.

@davidfowl http.sys was one of my considerations.

p/s @davidfowl Is there any documentation about the queue length/mechanism of http.sys, e.requestQueueLimit or so ?, I tried to look around but see no clear documentation about that - What I'm doing is refer to the code here https://github.com/dotnet/aspnetcore/blob/main/src/Servers/HttpSys/src/HttpSysOptions.cs

davidfowl commented 2 years ago

It's less about HTTP.sys and more trying to explain where the differences might be coming from. If you were doing blocking IO before, maybe the HTTP.sys queue was saving you from really bad cases.

I also wonder if the server queue is the problem and tried with race limiter (but It seems a kind of middleware I think - Microsoft.AspNetCore.ConcurrencyLimiter), but there is no improvement and started to receive many 503 errors

This should help if this was indeed the problem. Though we'd need to see how you configured it.

p/s @davidfowl Is there any documentation about the queue length/mechanism of http.sys, e.requestQueueLimit or so ?, I tried to look around but see no clear documentation about that - What I'm doing is refer to the code here

Somewhere deep in the guts of windows documentation, I believe the default is 1000 based on this doc https://docs.microsoft.com/en-us/windows/win32/http/configuring-properties-in-http-version-2-0 but @Tratcher can correct me.

What is your application doing that you are experiencing threadpool starvation? Were you using HttpListener on .NET Framework?

xiaomi7732 commented 2 years ago

@hungdoan-groove by chance you have switched the host environment for the application?

MinThread count in the pool by default aligns to the processor number. That might be another different setting to watch for.

Otherwise, I would suggest taking a dotnet-trace to find out whether thread starvation is happening.