"async" fails as the number of threads increases

Kaelum commented 3 years ago

UPDATE 03/05/2021: A few months ago we discovered that this was somehow tied to the use of the async keyword, and have since been able to prove it in the AcmeWebApi test application. As @davidfowl has stated in our other thread, this is most likely a race condition. Since it is virtually impossible to write any application that doesn't use async in some way now (due to the core library changes). it is not possible to run tests that only use synchronous code. I can say that when we did have synchronous code, we were seeing a drastically higher performance metric than we currently are seeing.

Leaving the following, as that is what we started the thread with:

We have run into several issues with ASP.NET Core that appear to be threading related. I initially created #26955, as that was the first issue that we ran into, but creating an application that can be tested is till ongoing. In the process of creating an application for that purpose, we were able to replicate another issue, which is the topic of this thread. The application linked below replicates this issue under the following conditions:

3,500+ concurrent clients (we are required to exceed 8,000).
Average throughput in total across all connections is 6,000 req/s (this is our minimum for maximum throughput).
VM has 2 dedicated CPUs and 4GB RAM.

Under these conditions we observe numerous HeartbeatSlow issues across random connections (threads), which in our full application leads to complete system failure over time. We are working on providing ways to replicate the other issues that we have observed, but these are currently the only ones that we can replicate for you in a test application.

This issue ONLY exists on Linux (we used Ubuntu 20.04.1 LTS for verification) and results in both a significant reduction of throughput and a significantly higher latency. When running in our full application, this issue, along with others, causes a complete system failure (APPCRASH) as the process runs out of memory. No matter how much we try, this issue, and the others, cannot be replicated in a Windows environment (Windows 10 and Windows Server 2019 were tested).

SDK: 3.1.301 VS: 16.8.2

The test application is available in the private repo AcmeWebApi.

FULL DISCOLSURE I work for Webroot / Carbonite / OpenText and the application discussed above is the property of said entities. Microsoft is a direct / indirect customer of ours, so I am limited on the information that I am allowed to provide.

sebastienros commented 3 years ago

Your repository is currently private. Can you make it public?

Kaelum commented 3 years ago

@sebastienros done

ghost commented 3 years ago

Thanks for contacting us. We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

sebastienros commented 3 years ago

Based on https://github.com/Kaelum/AcmeWebApi/blob/main/src/AcmeWebApi/Services/ApiService.cs#L329 and https://github.com/Kaelum/AcmeWebApi/blob/main/src/AcmeWebApi/Handlers/TcpHandler.cs#L415-L420

You are probably filling up the thread pool since you use 3,500 simultaneous connections that are all queued and block each thread.

I don't think it shows a bug in aspnet. Do you want to explain what you are trying to achieve to get some advice on how to approach it differently?

Kaelum commented 3 years ago

@sebastienros if you looked at the application, you'd see that 18,000 threads are available.

halter73 commented 3 years ago

I assume you're referring to how 18,0000 "min" worker/completion threads are specified here which is used here to call ThreadPool.SetMinThreads.

Unfortunately, calling SetMinThreads to set a high number of "min" threads is not a server bullet that allows you scale blocking code. I wish it was. It would make porting legacy pre-async apps far easier. The SetMinThreads MSDN doc specifically warns agains this:

Caution: By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.

And while it might not be obvious given the "min" in the name, the CLR doesn't just spin up 18,000 worker + 18,000 completion threads during the call to SetMinThreads. Instead it merely tells the ThreadPool that if there is currently less than the specified "min" ThreadPool threads and there isn't an idle thread ready to handle a new work item when it's dispatched to the ThreadPool, the ThreadPool should immediately spawn a new thread for the work item instead of using the default (slow) algorithm for managing thread creation and destruction.

If the CLR were to actually try to spin up 18,000 worker + 18,000 completion threads during the call to SetMinThreads, it would most likely be killed by the Linux OOM killer (or "(APPCRASH) as the process runs out of memory") before the call to SetMinThreads could complete. Threads use a lot of memory. Since the CLR spawns the threads lazily, the process doesn't get killed until later when too many concurrent items are dispatched leading to too many spawned threads.

And even if all those threads were somehow preallocated and available, a system with far less cores than threads can still easily become overwhelmed because of all of the preemption and context switching that all those threads incur. That's what the warning on the MSDN doc is getting at.

Kaelum commented 3 years ago

@halter73 I never said otherwise. There are enough threads, and there is no thread starvation. The simple fact that this works perfectly on Windows, and not at all on Linux, should raise a red flag. Anywho, I'll wait for Microsoft to fix the issues. This is only the beginning of all the issues that we've uncovered.

davidfowl commented 3 years ago

cc @kouvel

sebastienros commented 3 years ago

@Kaelum Sorry if you mentioned it already, but were you running in IIS or directly through kestrel on Windows?

Kaelum commented 3 years ago

@sebastienros you should look at the application. IIS doesn't run on Linux.

sebastienros commented 3 years ago

@Kaelum you should read my questions before you answer them

I am French, I can answer this way too

davidfowl commented 3 years ago

@Kaelum "works perfectly" might be a stretch. There might be some minor differences causing the behavior you're seeing that we've not aware of but it doesn't change the fact that the code should stop blocking threads.

Kaelum commented 3 years ago

@davidfowl on Windows, the processor is fairly constant (+/-5%), the number of threads is fairly constant (+/- 20), throughput is exactly on target (6,000 req/s), and latency are exactly as we'd expect. Maybe I should have said "working as expected".

However, on Linux, the processor is all over the place (+/- 30%), we don't have a good way of monitoring threads, the throughput is abysmal (2,000-2,500 req/s), and the latency is 4+ times that of what we'd expect. The application linked presents these results as described.

EDIT: I forgot to add the when running on Linux, many requests never get a response, even after 60 seconds of waiting.

davidfowl commented 3 years ago

It might make sense to still remove the blocking to isolate the problem. Even if windows handles it better it doesn't mean there's a bug, it could just be behavior that differs for some reason that's not immediately obvious.

Waiting for us the magically resolve the issue might work but it's a risk

Kaelum commented 3 years ago

@davidfowl what is your suggestion for removing what is blocking? We're using asynchronous coding, so we're not aware of anything in our code that is blocking. The code in the TcpHandler here is not what we are currently using, the Task.Run described above, has been replaced by await, as we found that a single connection doesn't support multiple threads (the topic of #26955).

I started this test application based on the code that we had at the time that I opened #26955. I can update the code, but it will probably take a few days, to include all the tests results as well. BTW, the repo includes the tests that we used (Jmeter), along with the results of those tests.

davidfowl commented 3 years ago

@sebastienros identified blocking above:

You're core processing logic processRequestBuffer - https://github.com/Kaelum/AcmeWebApi/blob/3593528fdb27e460ab332eec922ec292d61c28bc/src/AcmeWebApi/Handlers/TcpHandler.cs#L495

Is being dispatched to the thread pool, and then dispatches again and blocks:

https://github.com/Kaelum/AcmeWebApi/blob/3593528fdb27e460ab332eec922ec292d61c28bc/src/AcmeWebApi/Handlers/TcpHandler.cs#L415

You should just Task.Run and make processRequestBuffer an async method. That's likely one of the reasons for poor performance WRT to threading. There are lots of other things that could be done to improve the code but those aren't likely relevant to the discussion.

halter73 commented 3 years ago

Kaelum commented 3 years ago

@davidfowl & @halter73 I'll just update the code to what we're currently using. That is what I was saying was already changed. Originally, we were trying to parallelize requests that came in, but that lead to #26955, so we removed the parallelization and just blocked requests instead. Give me a day or 2 to update the code to what we're currently using, and to update the test results.

davidfowl commented 3 years ago

@Kaelum you can (and probably should) parallelize requests you just need to understand which part of the code needs to be synchronized. Kestrel HTTP layer and SignalR's is built on the exact same primitives so it's all possible but all async code needs to be written with care.

halter73 commented 3 years ago

If the application is running out of memory, collecting a memory dump might help. Once the application starts to struggle under load, try running dotnet-dump collect and taking a look at it with dotnet-dump analyze. dumpheap -stat, clrstack -all and dumpasync are all interesting commands to take a look at.

https://docs.microsoft.com/en-us/dotnet/core/diagnostics/debug-memory-leak

Kaelum commented 3 years ago

@davidfowl you might think that parallelizing requests would increase performance, and be a good idea, but that isn't always true. Due to the nature of the requests, and the fact that the result is larger than the request, it ends up being detrimental. We might create a dual request channel, but I don't have enough metrics at this time to determine if it would help. If it did, the increase would be somewhere in the 1-5% range, but it could also cause a 15% loss in certain situations.

@halter73 thanks! I'll look into those tools once I have some time, which might be in the next 2 weeks.

I'm updating the ACME project to better represent the current code that we're using, which significantly different. I should have an update for you no later than EOB tomorrow.

For a little context I've been authorized to say that this system processes an excess of 800 million req/day across hundreds of VMs world-wide. In the live application this equates to ~12 billion Redis queries/day.

davidfowl commented 3 years ago

@davidfowl you might think that parallelizing requests would increase performance, and be a good idea, but that isn't always true. Due to the nature of the requests, and the fact that the result is larger than the request, it ends up being detrimental. We might create a dual request channel, but I don't have enough metrics at this time to determine if it would help. If it did, the increase would be somewhere in the 1-5% range, but it could also cause a 15% loss in certain situations.

Color me skeptical 😄

For a little context I've been authorized to say that this system processes an excess of 800 million req/day across hundreds of VMs world-wide. In the live application this equates to ~12 billion Redis queries/day.

👏🏾 impressive!

Kaelum commented 3 years ago

@davidfowl & @halter73 I committed the changes to TcpHandler so that it represents our current code base, with the exception of some logging that I can't make public. That logging is nonblocking and offloaded using non-ThreadPool threads.. I'm running tests against this version of the code and thus far, the performance has increased for Windows, but it has become even more unstable for Linux. I'll update the repo with the results of the tests tomorrow, as they'll take a while to complete.

UPDATE: The memory on the the Linux VM increased by 28% to 28.7% and is not releasing. Unfortunately, I can't look into it further at this time, but this is one of the observations we've noticed in production as well. I'll need to include more context on this as well.

Kaelum commented 3 years ago

@davidfowl & @halter73 I added the logs and the reports for the tests that I ran yesterday. I ran an additional test today to see if there was a memory leak, but the memory didn't increase, so it's probably just the memory allocated for each thread.

The logs and reports clearly show that there is a significant issue on Linux, and that issue can't be replicated on Windows. If I run a throughput test with the maximum throughput over the fewest number of connections (tested with 100 connections @ 6,000 req/s), the issue does not occur on Linux. We don't know the threshold at which Linux breaks, but I believe it is somewhere over 1,000 concurrent connections, and it occurs whenever the total throughput across all connections exceeds ~2,500 req/s.

Let me know if additional information is needed.

Kaelum commented 3 years ago

@Pilchie & @davidfowl this is more than just a Performance issue. It is also a data loss issue, as requests are left w/o a response.

davidfowl commented 3 years ago

@Pilchie & @davidfowl this is more than just a Performance issue. It is also a data loss issue, as requests are left w/o a response.

In situations like this, we look to see if a majority of people are hitting this issue. Right now our assumption is that there's an application bug somewhere. We might be wrong and there might be a stress bug but the next logical step is to provide something more complete and narrowed down that points to an issue in the framework/runtime itself.

It seems like there might be an issue on linux, but we don't know where it is and why you're losing data and not everyone is hitting this issue so it's not on the top of the queue to investigate. There's a finite team with limited resources and we have to prioritize what we investigate. I think if you continued to investigate and make progress we could assist by lending our experience as you narrowed the repro down as time permitted.

As it stands right now, there's nothing for us to act that (that I can see).

PS: We had an issue like this last month with an internal team and it turned out to be application code allocating more on linux because of the case sensitivity of the file system. It's not always clear cut where the problem is...

Kaelum commented 3 years ago

@davidfowl I guess we'll need to use our other support options. I think we have 1 or 2 support people at Microsoft. It is not a reasonable expectation for us to find the bugs in the .NET Core libraries, which is what you are indirectly asking for. I'll continue working on reducing the test application size, but that is all that I can do at this point.

davidfowl commented 3 years ago

@Kaelum glad to assist once we have a focal point, right now the issue hasn't been narrowed down enough to pin point an area of investigation. Anything you can do to help that would be great.

Kaelum commented 3 years ago

@davidfowl there is a new branch "Minimalistic" which has reduced the running code down to just the TcpHandler. The issue is very easy to replicate using the load that I described above, and I can also replicate the issue slightly with Windows. Linux presents ~20x the number of warnings that Windows is presenting. It is 100% related to async, as you'll see in the code. While running in on Windows, I could see that there were between 4,000 and 4,600 threads running during the test of 3,500 clients. You shouldn't have any problem replicating the issue.

Here is the test that I ran. The throughput here should be around 6,000req/s:

Creating summariser <summary>
Created the tree successfully using ACME_Variable_UriInfo_Test.jmx
Starting the test @ Mon Dec 14 19:46:08 UTC 2020 (1607975168260)
Waiting for possible Shutdown/StopTestNow/Heapdump message on port 4445
summary +   3931 in 00:00:22 =  179.3/s Avg:  3584 Min:    11 Max: 21028 Err:     1 (0.03%) Active: 3500 Started: 3500 Finished: 0
summary +  10250 in 00:00:30 =  344.0/s Avg: 13556 Min:     7 Max: 27527 Err:  4773 (46.57%) Active: 3500 Started: 3500 Finished: 0
summary =  14181 in 00:00:52 =  274.2/s Avg: 10792 Min:     7 Max: 27527 Err:  4774 (33.66%)
summary +   6210 in 00:00:30 =  209.2/s Avg: 13872 Min:     9 Max: 25945 Err:  2426 (39.07%) Active: 3500 Started: 3500 Finished: 0
summary =  20391 in 00:01:21 =  250.5/s Avg: 11730 Min:     7 Max: 27527 Err:  7200 (35.31%)
summary +   2714 in 00:00:31 =   87.0/s Avg: 20623 Min: 10288 Max: 37603 Err:  2374 (87.47%) Active: 3500 Started: 3500 Finished: 0
summary =  23105 in 00:01:53 =  205.2/s Avg: 12775 Min:     7 Max: 37603 Err:  9574 (41.44%)
summary +   3619 in 00:00:30 =  121.2/s Avg: 37353 Min:    40 Max: 73387 Err:  1690 (46.70%) Active: 3500 Started: 3500 Finished: 0
summary =  26724 in 00:02:22 =  187.6/s Avg: 16103 Min:     7 Max: 73387 Err: 11264 (42.15%)

davidfowl commented 3 years ago

Is the test client or instructions on how to reproduce the issue on the readme in the branch?

Kaelum commented 3 years ago

The test file "ACME_Variable_UriInfo_Test.jmx" is in the "Jmeter Tests" folder, but you'll need to create a file (described in the README) or 2 of URLs, as I'm not allowed to share ours. 10 URLs in each file should be fine, in fact it can be any text under 2048 characters per line for this app. Ours have over 1 million entries. It's a Jmeter 4 test. The PowerShell command line that I used to run the test is:

jmeter -n -t 'ACME_Variable_UriInfo_Test.jmx' -Jhost='host machine IP or DNS name' -Jport=5000 -Jduration=3600 -Jthreads=3500 -Jthroughput=360000

davidfowl commented 3 years ago

Can you put that in the readme? That'll make it easier to share with people

Kaelum commented 3 years ago

I updated the README file in the Jmeter Tests folder.

Kaelum commented 3 years ago

We have since proven that this is a fault in the async system, which starts to be visible when you have more than ~800 threads running. The more async calls there are, the sooner it occurs. I'm changing the title of the issue to reflect our findings.

davidfowl commented 3 years ago

Async doesn't cause an increase in threads though... I think we're still at a loss at to what the problem is. Can you also share any event counters or traces you have to show what might be the problem?

Does this application still reproduce the issue https://github.com/Kaelum/AcmeWebApi?

Kaelum commented 3 years ago

I didn't say that async increases the number of threads, I said that it is affected by the number of threads.

Yes, the application demonstrates the issue. If you want to enhance the issue, use more threads, or increase the number of async loops. Increasing either makes it more visible.

davidfowl commented 3 years ago

Sorry, the issue says "async fails" but I don't know what async has to do with the number of threads nor why your application would need 800 threads to function unless there was indeed blocking happening. If there was blocking, then it would explain why you have such a large number of threads but not why "async fails" (I'm not clear what that means either).

Kaelum commented 3 years ago

We have 4,000-8,000 conncurrent requests coming in to an ASP.NET application. ASP.NET is creating those threads, not us. We have only increased the number of available threads to meet what ASP.NET is asking for.

By "async fails", I mean that async calls slow the application down so slow that it is unable to do anything, falls behind, and eventually ends OOM. All of this is within the ASP.NET, .NET, Kestrel, where Kestrel reports timeouts and ASP.NET itself ends OOM, not our application..

davidfowl commented 3 years ago

ASP.NET Core uses the thread pool, which is pretty efficient at reusing threads, assuming you're not blocking. The only way to end up with a thread count in the hundreds would be to:

Have a machine with that many cores
Set the minimum number of threads to something like that and to having blocking in your application

Guessing from these settings, I'm guessing it's the latter. You're telling the thread pool that it's free to inject 5000 threads quickly if you block, stall, etc. Then there are some known thread pool scalability issues when you use a high number of threads

My guess is that there's some blocking somewhere in your application and that might be hard to track down. It might be blocking on tasks, it might be a highly contended lock, it might be some CPU intensive code on the request path or it might be something else entirely.

One thing that might be helpful would be to do is to lower the number to something super small OR to don't set it at all and see what happens. That'll reveal where the blocking is happening. If you take a memory dump, you can see the threads that are taking a long time to complete.

Kaelum commented 3 years ago

If you can show me anything that is blocking in the test application, you are welcome to do so. There is nothing but a loop of creating XmlWriters, discarding the results, and responding with a fixed response.

davidfowl commented 3 years ago

@Kaelum is this the code you use to simulate high CPU? https://github.com/Kaelum/AcmeWebApi/blob/a096e2d052a32ae2d0bcb50ff0a96ca6ba7fc833/src/AcmeWebApi/Handlers/TcpHandler.cs#L383

davidfowl commented 3 years ago

I'm currently trying to run the minimalist branch to see if I can reproduce the problem.

Kaelum commented 3 years ago

Yes, that's the code. It's much easier to replicate the issue on Linux, but you'll see the Heartbeat messages in the log file on Windows as well. Increasing the loop count above 300 should increase it too. Tomorrow, I try running the code in our actual application without changing the number of threads from the default values. I think we had to increase it because certain code paths can attempt to use up to 100 Tasks running at the same time, though we could change it to use Parallel with a performance hit.

davidfowl commented 3 years ago

Yes, that's the code. It's much easier to replicate the issue on Linux, but you'll see the Heartbeat messages in the log file on Windows as well

Sounds good. I'll try to reproduce the issue on windows for now.

Tomorrow, I try running the code in our actual application without changing the number of threads from the default values

I'm sure you observed behavior that made you increase it as well but it's really high if there's no blocking and mostly IO bound. It's also strange to see the hill climbing parameters tweaks (oddly specific). Did you look at any CPU profiles before those changes were made?

I think we had to increase it because certain code paths can attempt to use up to 100 Tasks running at the same time, though we could change it to use Parallel with a performance hit.

Right, if you have CPU bound work, it can compete for thread pool resources easily. IO bound work (reading the request and writing the response) should be fast and shouldn't use many threads.

A high thread count causes the OS to context switch between those threads in order make sure they are doing useful work. The heartbeat message usually shows up when the timer logic is pre-empted and another thread gets to run as a result, making the heartbeat take longer than expected.

Kaelum commented 3 years ago

We didn't look at any CPU profiles, as the issue only started occurring after we were able to perform testing with thousands of concurrent connections. Prior to that, we were only able to perform throughput tests by controlling the number the connections, which kept it to a minimum of ~100 concurrent connections.

There is IO in our application, as we read from Redis using the StackExchange.Redis library, but that is the only IO during a request-response cycle. However, I found that the issue also occurred while solely calling API methods that did nothing more than read data from an array or List and return a response. That's when I started considering that it might be the Task/async code that was the issue, and I was able to replicate the issues.

The parallel tasks are parallel lookups of the same types of values, just with different keys. I'll try different options tomorrow, while I'm testing different levels of thread "priming", include none at all. If using Parallel with a max of 4 at a time is all we can do, while everything else works well, I think we can live with that, but we'll have to wait and see. If I remember correctly, the Parallel overhead was way beyond what we were allow to accept, so I might need to come up with something custom.

P.S. The Redis lookups are sub-millisecond, were we submit a batch of keys and await a batch of responses. This goes into the area where I can't elaborate further though.

Kaelum commented 3 years ago

I wasn't able to test anything yesterday, due to some IT issues, but I have started running some tests today. Using the default ThreadPool settings causes application response failures within a matter of seconds, while the application isn't logging any failures with itself. So, I changed the primed threads to 100, which was significantly better. We had a few response failures around startup, but everything recovered, so I've left the test running. 1 hour later, the number of threads is currently stable at 198, but I'm going to leave it running for at least 3 more hours.

800 appears to be the maximum number of threads that ASP.NET can reliably work with, and we're at ~25% of that, which is good. However, I do not believe that this 800 number is discussed anywhere in the documentation, and it should be. That being said, is this just an ASP.NET issue? If we were to manage the ports, threading, protocols (HTTP & HTTPS), use the pipelines directly, and completely remove ASP.NET from the equation, would we still have this 800 thread limitation? Would it exist at all? Would it just be higher? If so, can you guesstimate how much higher?

The sole reason for using ASP.NET, and not just .NET, is because of the built-in HTTP(S) support. If we can obtain significantly better support by managing the TCP connections ourselves, that's something that we as a company may need to discuss, as we can handle the protocols ourselves. Our LBs are managing our attack vectors, so Kestrel never needs to deal with then. Just thinking about other options, should those in charge ask about them.

davidfowl commented 3 years ago

I wasn't able to test anything yesterday, due to some IT issues, but I have started running some tests today. Using the default ThreadPool settings causes application response failures within a matter of seconds, while the application isn't logging any failures with itself. So, I changed the primed threads to 100, which was significantly better. We had a few response failures around startup, but everything recovered, so I've left the test running. 1 hour later, the number of threads is currently stable at 198, but I'm going to leave it running for at least 3 more hours.

What's causing these failures and how did you observe them. Connection timeout on the client side? Are you collecting performance counters for these runs? If no, please do, they can explain a lot about your workload. Install this https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-counters and collect the default counters during your test runs and let me know what you see.

800 appears to be the maximum number of threads that ASP.NET can reliably work with, and we're at ~25% of that, which is good. However, I do not believe that this 800 number isn't discussed anywhere in the documentation, and it should be.

There's no absolute maximum number of threads ASP.NET can work with, but 800 is high, probably too high for the system to schedule work in a way that is workable in your workload. How many cores do you have?

That being said, is this just an ASP.NET issue? If we were to manage the ports, threading, protocols (HTTP & HTTPS), use the pipelines directly, and completely remove ASP.NET from the equation, would we still have this 800 thread limitation? Would it exist at all? Would it just be higher? If so, can you guesstimate how much higher?

It's likely application related. There are no inherent limits in the system. It's limited by your code, your memory and your CPU usage. This is why collecting performance profiles is so important. It varies from workload to workload and when you care about performance, it's one of the first things you need to do.

The sole reason for using ASP.NET, and not just .NET, is because of the built-in HTTP(S) support. If we can obtain significantly better support by managing the TCP connections ourselves, that's something that we as a company may need to discuss, as we can handle the protocols ourselves. Our LBs are managing our attack vectors, so Kestrel never needs to deal with then. Just thinking about other options, should those in charge ask about them.

I haven't seen any evidence that points to ASP.NET being a problem here, but if it's easy to write the code yourself and get better performance then it might makes sense to do so. Comparing the performance profiles of the 2 might be valuable if you care to see where the differences are as well.

Last but not least, we have pretty mature performance infrastructure that we use to test our applications at https://github.com/dotnet/crank. It can collect, counters, profiles of various kinds and can run any client/server processes (including jmeter). That might be something useful you can use to execute your scenario and collect the relevant information that'll help us better understand your problems.

Kaelum commented 3 years ago

What's causing these failures and how did you observe them. Connection timeout on the client side? Are you collecting performance counters for these runs? If no, please do, they can explain a lot about your workload. Install this https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-counters and collect the default counters during your test runs and let me know what you see.

At least for this week, I can't collect any results. I'm using shared QA resources for this, and they're performing some critical tests this week. Turning on the collection of results in Jmeter requires a huge amount of resources, which would prevent QA from using the server. I'm not ready for the collection either, as I want to perform some better tuning before I do. That being said, there isn't any way for me to know what the failures are w/o the collection of the results.

There's no absolute maximum number of threads ASP.NET can work with, but 800 is high, probably too high for the system to schedule work in a way that is workable in your workload. How many cores do you have?

The VMs that IT has selected are 2 core 4 GB RAM, and dedicated. Due to costs, changing this is not an option.

This is an apples to oranges comparison, and I know that, but I'm only stating it for reference: The C++ application that we have replaced, was running 8,000 threads/connections with only ~5% CPU usage. We are also collecting much more information than we did in the C++ application. We chose .NET because of maintenance issues with C++ and fully understand the differences. The threading issues that we have only recently run into, are a little hard for come to swallow.

It's likely application related. There are no inherent limits in the system. It's limited by your code, your memory and your CPU usage. This is why collecting performance profiles is so important. It varies from workload to workload and when you care about performance, it's one of the first things you need to do.

This is something that we may need to schedule, if the current tuning that I am working on isn't acceptable.

I haven't seen any evidence that points to ASP.NET being a problem here, but if it's easy to write the code yourself and get better performance then it might makes sense to do so. Comparing the performance profiles of the 2 might be valuable if you care to see where the differences are as well.

Last but not least, we have pretty mature performance infrastructure that we use to test our applications at https://github.com/dotnet/crank. It can collect, counters, profiles of various kinds and can run any client/server processes (including jmeter). That might be something useful you can use to execute your scenario and collect the relevant information that'll help us better understand your problems.

I should be able to report back more sometime next week. I've been considering the use of C++/CLI when dealing directly with pipelines, creating a hybrid application, which may be helpful. For now, it looks like if I change the minimum ThreadPool priming to 200, we can achieve an immediate benefit w/o any additional coding, which would reduce our current costs by ~2/3. I still have a lot more testing to do though.

davidfowl commented 3 years ago

The C++ application that we have replaced, was running 8,000 threads/connections with only ~5% CPU usage. We are also collecting much more information than we did in the C++ application. We chose .NET because of maintenance issues with C++ and fully understand the differences. The threading issues that we have only recently run into, are a little hard for come to swallow.

I'd like to see how 8000 threads ended up being 5% CPU on a 2 core machine. I don't see how that's possible even with c++.

I should be able to report back more sometime next week. I've been considering the use of C++/CLI when dealing directly with pipelines, creating a hybrid application, which may be helpful. For now, it looks like if I change the minimum ThreadPool priming to 200, we can achieve an immediate benefit w/o any additional coding, which would reduce our current costs by ~2/3. I still have a lot more testing to do though.

Sounds good, though personally, I wouldn't be satisfied not understanding why my system behaved like this.

Kaelum commented 3 years ago

...though personally, I wouldn't be satisfied not understanding why my system behaved like this.

Based on what you've said, it's because of the excessive Task switching in the eyes of the ASP.NET Heartbeat monitor. The side effect of reducing the number of threads, is a much higher latency of ~2-3x that of the higher number of threads, which we need to discuss internally.

dotnet / aspnetcore

"async" fails as the number of threads increases #28480