dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.15k stars 4.71k forks source link

Workstation GC Performance #90555

Open rreminy opened 1 year ago

rreminy commented 1 year ago

Description

During the past months I been dealing with major stuttering that happen as a result of the Garbage Collection process triggering way too often while rebuilding indexes of my workload. Said indexes consist of strings, which are then added into a ConcurrentDictionary containing collections of them.

During the last month I decided to build a repro code that highlights the performance issues of the garbage collection process, from which I had no success until last week where I finally found the causes of this performance issue and fixed it. This helped me have a good picture of what's causing this extreme degradation in performance and helped me produce the repro code.

The repro code is quite extreme as it does this more than worse as my actual workload had, but I'm in the hopes that it helps highlight the issue for further debugging. The repro code is not trivial as its a combination of workloads.

The repro code can be found here: https://github.com/rreminy/GCOddities

Its a BenchmarkDotNet project with the problematic workload.

Configuration

While the results are in a Linux distribution, this issue also happens on Windows 10 with an Intel Core i7 975, x64 The results are produced on a dual Intel(R) Xeon(R) CPU X5650 @ 2.67GHz machine, for a total of 12 cores and 24 threads. .NET 7

Regression?

I made some research and found that this may had been caused my the Regions DC however the issues I found are technically abandoned, with no further activity or information since. I do however hope that this helps in debugging this issue further.

I did not test with the clrgc.dll

Data

Method Job Server ItemCount parallel Mean Error StdDev Gen0 Gen1 Gen2 Allocated
RebuildIndexes Job-INXKYK False 250000 False 14,164.0 ms 98.15 ms 91.81 ms 189000.0000 95000.0000 2000.0000 1.11 GB
RebuildIndexes Job-GXGCQI True 250000 False 4,863.8 ms 29.85 ms 27.92 ms - - - 1.11 GB
RebuildIndexes Job-INXKYK False 250000 True 24,011.2 ms 1,905.70 ms 5,280.68 ms 200000.0000 101000.0000 2000.0000 1.11 GB
RebuildIndexes Job-GXGCQI True 250000 True 738.4 ms 14.62 ms 25.61 ms 1000.0000 - - 1.12 GB

Detailed results can be found here: https://github.com/rreminy/GCOddities/tree/master/Results

Notice how does the Workstation GC struggle in the parallel workload... to the point you can ask "Is it really parallel?" In this repro it also does struggle in the sequential workload when compared to the Server GC.

Analysis

The issue is specifically caused by the number of string allocations, specifically since I had a mistake in my implementation, from which I highlighted in this repro source code.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
### Description During the past months I been dealing with major stuttering that happen as a result of the Garbage Collection process triggering way too often while rebuilding indexes of my workload. Said indexes consist of strings, which are then added into a ConcurrentDictionary containing collections of them. During the last month I decided to build a repro code that highlights the performance issues of the garbage collection process, from which I had no success until last week where I finally found the causes of this performance issue and fixed it. This helped me have a good picture of what's causing this extreme degradation in performance and helped me produce the repro code. The repro code is quite extreme as it does this more than worse as my actual workload had, but I'm in the hopes that it helps highlight the issue for further debugging. The repro code is not trivial as its a combination of workloads. The repro code can be found here: https://github.com/rreminy/GCOddities Its a BenchmarkDotNet project with the problematic workload. ### Configuration While the results are in a Linux distribution, this issue also happens on Windows 10 with an Intel Core i7 975, x64 The results are produced on a dual Intel(R) Xeon(R) CPU X5650 @ 2.67GHz machine, for a total of 12 cores and 24 threads. .NET 7 ### Regression? I made some research and found that this may had been caused my the Regions DC however the issues I found are technically abandoned, with no further activity or information since. I do however hope that this helps in debugging this issue further. I did not test with the clrgc.dll ### Data | Method | Job | Server | ItemCount | parallel | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated | |--------------- |----------- |------- |---------- |--------- |------------:|------------:|------------:|------------:|------------:|----------:|----------:| | **RebuildIndexes** | **Job-INXKYK** | **False** | **250000** | **False** | **14,164.0 ms** | **98.15 ms** | **91.81 ms** | **189000.0000** | **95000.0000** | **2000.0000** | **1.11 GB** | | RebuildIndexes | Job-GXGCQI | True | 250000 | False | 4,863.8 ms | 29.85 ms | 27.92 ms | - | - | - | 1.11 GB | | **RebuildIndexes** | **Job-INXKYK** | **False** | **250000** | **True** | **24,011.2 ms** | **1,905.70 ms** | **5,280.68 ms** | **200000.0000** | **101000.0000** | **2000.0000** | **1.11 GB** | | RebuildIndexes | Job-GXGCQI | True | 250000 | True | 738.4 ms | 14.62 ms | 25.61 ms | 1000.0000 | - | - | 1.12 GB | Detailed results can be found here: https://github.com/rreminy/GCOddities/tree/master/Results Notice how does the Workstation GC struggle in the parallel workload... to the point you can ask "Is it really parallel?" In this repro it also does struggle in the sequential workload when compared to the Server GC. ### Analysis The issue is specifically caused by the number of string allocations, specifically since I had a mistake in my implementation, from which I highlighted in this repro source code.
Author: rreminy
Assignees: -
Labels: `tenet-performance`, `area-GC-coreclr`, `untriaged`
Milestone: -
mangod9 commented 1 year ago

Hello @rreminy, could you please clarify whether you are comparing with SVR mode and noticing WKS introduces higher pause times? That is mostly expected.

rreminy commented 1 year ago

In this repro yes, I'm attempting to highlight the performance degradation in Workstation GC in this particular workload. This actually runs inside a bundled .NET runtime so I have no control over the GC mode used.

Yes, I indeed noticed a lot of GC pause times at the time I checked with dotnet trace + perfview. I currently don't have this anymore to show but can produce one with this repro if needed.

rreminy commented 1 year ago

Need to add since I'm being told this is important: