dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.29k stars 4.74k forks source link

Thread contention while throwing and catching exceptions #97181

Open rchoffardet opened 9 months ago

rchoffardet commented 9 months ago

Description

I don't know if that's expected. Feel free to close the issue if it is.

We identified some thread contention while throwing and catching exceptions on multiple threads. The methodology was to simply throw 10_000 exceptions in a multithreaded context. I was expecting that the CPU time spent per exception would stay relatively stable but it's increasing instead. If you do the same thing with method calls, the CPU time per calls stays flat.

Configuration

Windows 11 (10.0.22631.3007/23H2/2023Update/SunValley3) 12th Gen Intel Core i7-12700H, 1 CPU, 20 logical and 14 physical cores .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Regression?

I don't know

Data

Code

https://gist.github.com/rchoffardet/228f4bf1892e403c65487fcfe46afe35

Result

image

ghost commented 9 months ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

Issue Details
### Description I don't know if that's expected. Feel free to close the issue if it is. We identified some thread contention while throwing and catching exceptions on multiple threads. The methodology was to simply throw 10_000 exceptions in a multithreaded context. I was expecting that the CPU time spent per exception would stay relatively stable but it's increasing instead. If you do the same thing with method calls, the CPU time per calls stays flat. ### Configuration Windows 11 (10.0.22631.3007/23H2/2023Update/SunValley3) 12th Gen Intel Core i7-12700H, 1 CPU, 20 logical and 14 physical cores .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2 ### Regression? I don't know ### Data #### Code https://gist.github.com/rchoffardet/228f4bf1892e403c65487fcfe46afe35 #### Result ![image](https://github.com/dotnet/runtime/assets/5098668/71cf4514-0899-44e1-b860-4d46cf599ac4)
Author: rchoffardet
Assignees: -
Labels: `area-System.Threading`, `tenet-performance`, `untriaged`
Milestone: -
janvorli commented 9 months ago

@rchoffardet thank you for the test and reported issue! There is a global lock in the exception stack trace building, maybe that's the source of the contention identified by the attached test. It is something I am planning to get rid of. I'll run the test locally to get full understanding of where the issue is.

rchoffardet commented 9 months ago

Alongside my tests, I noticed that when I retrieve the stack trace, it takes more time and it allocates more, so I naively thought that the stack trace wouldn't be build if not requested. Maybe that's related to the string building and not the stack trace itself.

janvorli commented 3 months ago

The repro has a bug. It doesn't zero the totalTimeNs after each run for a specific thread count, so the result just keeps accumulating the time. After fixing it, the results were more or less the same (growing very slightly) for the number of threads upto the # of cores on Windows. On Linux, the results are growing with thread count though, that's worth looking into.

rchoffardet commented 3 months ago

Sorry about that, I "extracted" this gist from a bigger script and made mistakes along the way. It should be fixed now. I might be missing something as I still have growing numbers on Windows / .NET 9.0.0 preview 6. Could you share your fix?

janvorli commented 3 months ago

My fix was to set totalTimeNS=0 after the Console.WriteLine, which should effectively have the same effect as your fix. As for the performance, it is interesting though. This is from my AMD Ryzen 5 5600X 6-Core Processor:

2924,37
3271,7
3037,3133333333335
3272,7475
3481,018
3965,5066666666667

And this is from my Intel i9-9900X 10 physical cores CPU :

3764.84
4649.17
5072.29
5374.31
7517.652
6482.285
10037.368571428571
10261.98375
8214.833333333334
9332.818

Clearly, the intel is scaling worse than AMD here, I was testing it on the AMD before

janvorli commented 3 months ago

Moving further investigations to 10 since we don't have time left for it in the 9 timeframe and it is not critical.