dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.4k stars 10k forks source link

"async" fails as the number of threads increases #28480

Closed Kaelum closed 3 years ago

Kaelum commented 3 years ago

UPDATE 03/05/2021: A few months ago we discovered that this was somehow tied to the use of the async keyword, and have since been able to prove it in the AcmeWebApi test application. As @davidfowl has stated in our other thread, this is most likely a race condition. Since it is virtually impossible to write any application that doesn't use async in some way now (due to the core library changes). it is not possible to run tests that only use synchronous code. I can say that when we did have synchronous code, we were seeing a drastically higher performance metric than we currently are seeing.

Leaving the following, as that is what we started the thread with:

We have run into several issues with ASP.NET Core that appear to be threading related. I initially created #26955, as that was the first issue that we ran into, but creating an application that can be tested is till ongoing. In the process of creating an application for that purpose, we were able to replicate another issue, which is the topic of this thread. The application linked below replicates this issue under the following conditions:

Under these conditions we observe numerous HeartbeatSlow issues across random connections (threads), which in our full application leads to complete system failure over time. We are working on providing ways to replicate the other issues that we have observed, but these are currently the only ones that we can replicate for you in a test application.

This issue ONLY exists on Linux (we used Ubuntu 20.04.1 LTS for verification) and results in both a significant reduction of throughput and a significantly higher latency. When running in our full application, this issue, along with others, causes a complete system failure (APPCRASH) as the process runs out of memory. No matter how much we try, this issue, and the others, cannot be replicated in a Windows environment (Windows 10 and Windows Server 2019 were tested).

SDK: 3.1.301 VS: 16.8.2

The test application is available in the private repo AcmeWebApi.

FULL DISCOLSURE I work for Webroot / Carbonite / OpenText and the application discussed above is the property of said entities. Microsoft is a direct / indirect customer of ours, so I am limited on the information that I am allowed to provide.

davidfowl commented 3 years ago

Based on what you've said, it's because of the excessive Task switching in the eyes of the ASP.NET Heartbeat monitor.

But it would be great to have some profiles to back up the hypothesis no? You make an educated guess based on your understanding of the system then you back it up with some hard numbers to make sure they align with what said hypothesis. If it doesn't then there's more investigation work needed, if it does then we can move on with our lives.

Kaelum commented 3 years ago

@davidfowl sorry for the late reply, but you asked questions that I am not allowed to discuss outside of the company. Everyone is happy with the changes that I have made, which resulted in a 400%+ performance increase and no stability issues, so thank you very much for the information that was missing. There are some objects that do not have synchronous methods for certain operations. If there were, this would have been a non-issue and we could have just used .NET Core and skipped ASP.NET Core (can't discuss why).

This information should be very prominently displayed in a performance section of the ASP.NET documentation. There is something similar in the IIS documentation, which states that the number of threads should never exceed ~250, but there isn't anything in this regard to .NET Core & Kestrel. In general, I'm guessing that this is more of a Task (asynchronous) system limitation, which also doesn't appear to be discussed anywhere.