dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.35k stars 4.74k forks source link

HTTPS performance degradation in HttpClient after .NET 7 migration #78686

Closed agerchev closed 1 year ago

agerchev commented 1 year ago

Hello,

I have a tool that i use to stress our web applications over http/https. It uses httpClient with SocketsHttpHandler. When I migrated it to .net 7, I saw some performance degradation in https connections.

Description

The stress tests over http are almost the same. The .net 7 version uses fewer CPU which is great :)

But the tests over https are very different. As you can see, in the .net 6 version i can achieve 28102 r/s avg, but with the newer version i can barely achieve 17028 r/s avg. The two versions (.net 6, .net 7) starts with around 220 threads (ThreadPool Thread Count) then drops. The .net 6 version drops to around 17 and saturates more CPU till the end. The new version incresses the threads regularly after the initial drop and in the end of the test it uses 169 threads with litle CPU saturation.

Configuration

All tests are done from the same virtual machine: CentOS 7 Linux It has 12 CPUs

Regression?

This is a regression from version 6

Data


.NET 6.0 Version  HTTP
[guest@vm-qa-lx1 httpPerformanceTool7]$ ../httpPerformanceTool3/HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="http://worker0.os1.dev.local:31874/static/css/40.5f362835.chunk.css" --default-request-version="1.1"

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                       1.4812e+08
    CPU Usage (%)                                                         54
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              307
    GC Fragmentation (%)                                                   0.737
    GC Heap Size (MB)                                                    117
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 1,476,848
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                18,023,880
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                21,747,560
    IL Bytes Jitted (B)                                              170,691
    LOH Size (B)                                                         288
    Monitor Lock Contention Count (Count / 1 sec)                         45
    Number of Active Timers                                              472
    Number of Assemblies Loaded                                           33
    Number of Methods Jitted                                           1,783
    POH (Pinned Object Heap) Size (B)                                 48,456
    ThreadPool Completed Work Item Count (Count / 1 sec)              41,849
    ThreadPool Queue Length                                                0
    ThreadPool Thread Count                                               19
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     389

5000000 iterations Completed in 00:01:59.3192526. 41904.3858476197r/s
.NET 7.0 Version  HTTP

[guest@vm-qa-lx1 httpPerformanceTool7]$ ./HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="http://worker0.os1.dev.local:31874/static/css/40.5f362835.chunk.css" --default-request-version="1.1"

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                       1.0666e+08
    CPU Usage (%)                                                         39.838
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              225.391
    GC Fragmentation (%)                                                   0.584
    GC Heap Size (MB)                                                    111.132
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 1,319,488
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                14,412,632
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                21,847,024
    IL Bytes Jitted (B)                                              199,599
    LOH Size (B)                                                           0
    Monitor Lock Contention Count (Count / 1 sec)                         38
    Number of Active Timers                                              450
    Number of Assemblies Loaded                                           32
    Number of Methods Jitted                                           2,118
    POH (Pinned Object Heap) Size (B)                                 80,832
    ThreadPool Completed Work Item Count (Count / 1 sec)              27,523
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                               14
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     368.828

5000000 iterations Completed in 00:02:00.9352484. 41344.4389964969r/s
.NET 6.0 Version HTTPS

[guest@vm-qa-lx1 httpPerformanceTool7]$ ../httpPerformanceTool3/HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="https://portal.dev.crc-btswsp-reg.dev.local/static/css/40.5f362835.chunk.css" --default-request-version="1.1"

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                       1.4672e+08
    CPU Usage (%)                                                         62
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              478
    GC Fragmentation (%)                                                   0.481
    GC Heap Size (MB)                                                    150
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 1,712,312
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                20,037,000
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                42,866,080
    IL Bytes Jitted (B)                                              282,181
    LOH Size (B)                                                         288
    Monitor Lock Contention Count (Count / 1 sec)                         52
    Number of Active Timers                                              501
    Number of Assemblies Loaded                                           40
    Number of Methods Jitted                                           3,025
    POH (Pinned Object Heap) Size (B)                                 97,528
    ThreadPool Completed Work Item Count (Count / 1 sec)              42,727
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                               17
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     614

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                       1.4035e+08
    CPU Usage (%)                                                         59
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              478
    GC Fragmentation (%)                                                   0.481
    GC Heap Size (MB)                                                    108
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 1,720,440
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                20,037,000
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                42,866,080
    IL Bytes Jitted (B)                                              282,181
    LOH Size (B)                                                         288
    Monitor Lock Contention Count (Count / 1 sec)                         57
    Number of Active Timers                                              502
    Number of Assemblies Loaded                                           40
    Number of Methods Jitted                                           3,025
    POH (Pinned Object Heap) Size (B)                                 97,528
    ThreadPool Completed Work Item Count (Count / 1 sec)              45,658
    ThreadPool Queue Length                                                0
    ThreadPool Thread Count                                               17
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     614

5000000 iterations Completed in 00:02:57.9182565. 28102.793374664165r/s
.NET 7.0 Version HTTPS 

[guest@vm-qa-lx1 httpPerformanceTool7]$ ./HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="https://portal.dev.crc-btswsp-reg.dev.local/static/css/40.5f362835.chunk.css" --default-request-version="1.1"

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   44,752,872
    CPU Usage (%)                                                         26.2
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              637.407
    GC Fragmentation (%)                                                   0.827
    GC Heap Size (MB)                                                    252.301
    Gen 0 GC Count (Count / 1 sec)                                         0
    Gen 0 Size (B)                                                       528
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                 2,459,864
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              296,792
    LOH Size (B)                                                           0
    Monitor Lock Contention Count (Count / 1 sec)                          2
    Number of Active Timers                                              502
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,202
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               3,051
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                               60
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     834.593

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   54,440,528
    CPU Usage (%)                                                         26.28
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              486.412
    GC Fragmentation (%)                                                   7.712
    GC Heap Size (MB)                                                     57.309
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 3,781,192
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                10,855,928
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              296,792
    LOH Size (B)                                                           0
    Monitor Lock Contention Count (Count / 1 sec)                          4
    Number of Active Timers                                              502
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,202
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               3,702
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                               72
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     688.062

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   62,450,416
    CPU Usage (%)                                                         26.567
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              278.168
    GC Fragmentation (%)                                                   2.185
    GC Heap Size (MB)                                                    113.748
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 2,135,640
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                15,039,568
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              296,792
    LOH Size (B)                                                           0
    Monitor Lock Contention Count (Count / 1 sec)                         11
    Number of Active Timers                                              502
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,202
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               4,088
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                               97
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     489.685

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   71,058,984
    CPU Usage (%)                                                         32.483
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              281.641
    GC Fragmentation (%)                                                   1.01
    GC Heap Size (MB)                                                    151.011
    Gen 0 GC Count (Count / 1 sec)                                         0
    Gen 0 Size (B)                                                 1,695,464
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                15,039,568
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              311,546
    LOH Size (B)                                                     319,520
    Monitor Lock Contention Count (Count / 1 sec)                         23
    Number of Active Timers                                              502
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,299
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               5,733
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                              121
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     492.118

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   69,496,240
    CPU Usage (%)                                                         29.941
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              279.937
    GC Fragmentation (%)                                                   1.01
    GC Heap Size (MB)                                                    111.566
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 1,749,456
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                15,039,568
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              311,546
    LOH Size (B)                                                     319,520
    Monitor Lock Contention Count (Count / 1 sec)                          6
    Number of Active Timers                                              502
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,299
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               6,371
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                              140
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     487.42

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   59,787,592
    CPU Usage (%)                                                         30.154
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              285.377
    GC Fragmentation (%)                                                   1.011
    GC Heap Size (MB)                                                    152.614
    Gen 0 GC Count (Count / 1 sec)                                         0
    Gen 0 Size (B)                                                 1,754,640
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                15,039,568
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              311,546
    LOH Size (B)                                                     319,520
    Monitor Lock Contention Count (Count / 1 sec)                          7
    Number of Active Timers                                              501
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,299
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               5,990
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                              141
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     489.333

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   58,716,960
    CPU Usage (%)                                                         28.846
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              283.8
    GC Fragmentation (%)                                                   1.01
    GC Heap Size (MB)                                                    106.567
    Gen 0 GC Count (Count / 1 sec)                                         1
    Gen 0 Size (B)                                                 1,780,064
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                15,039,568
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              311,546
    LOH Size (B)                                                     319,520
    Monitor Lock Contention Count (Count / 1 sec)                          6
    Number of Active Timers                                              502
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,299
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               6,413
    ThreadPool Queue Length                                                1
    ThreadPool Thread Count                                              152
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     489.275

[System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                   68,862,064
    CPU Usage (%)                                                         30.618
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                              286.429
    GC Fragmentation (%)                                                  47.053
    GC Heap Size (MB)                                                    335.172
    Gen 0 GC Count (Count / 1 sec)                                         0
    Gen 0 Size (B)                                                38,588,040
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                65,263,504
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                40,646,928
    IL Bytes Jitted (B)                                              321,040
    LOH Size (B)                                                     319,520
    Monitor Lock Contention Count (Count / 1 sec)                          7
    Number of Active Timers                                              146
    Number of Assemblies Loaded                                           36
    Number of Methods Jitted                                           3,410
    POH (Pinned Object Heap) Size (B)                                 97,240
    ThreadPool Completed Work Item Count (Count / 1 sec)               8,504
    ThreadPool Queue Length                                                0
    ThreadPool Thread Count                                              169
    Time spent in JIT (ms / 1 sec)                                         4.936
    Working Set (MB)                                                     605.18

5000000 iterations Completed in 00:04:53.6177404. 17028.94380015466r/s

Analysis

I have dumped the managed stacks (dotnet-stack) for .net 7 and most of the stacks end here:

System.Net.Security!Interop+Ssl.SslWrite(class Microsoft.Win32.SafeHandles.SafeSslHandle,unsigned int8&,int32,value class SslErrorCode&)

System.Net.Security!Interop+Ssl.SslRead(class Microsoft.Win32.SafeHandles.SafeSslHandle,unsigned int8&,int32,value class SslErrorCode&)

System.Net.Security!Interop+Ssl.BioWrite(class Microsoft.Win32.SafeHandles.SafeBioHandle,unsigned int8&,int32)

System.Net.Security!Interop+Crypto.BioRead(class Microsoft.Win32.SafeHandles.SafeBioHandle,unsigned int8[],int32)

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
Hello, I have a tool that i use to stress our web applications over http/https. It uses httpClient with SocketsHttpHandler. When I migrated it to .net 7, I saw some performance degradation in https connections. ### Description The stress tests over http are almost the same. The .net 7 version uses fewer CPU which is great :) But the tests over https are very different. As you can see, in the .net 6 version i can achieve 28102 r/s avg, but with the newer version i can barely achieve 17028 r/s avg. The two versions (.net 6, .net 7) starts with around 220 threads (ThreadPool Thread Count) then drops. The .net 6 version drops to around 17 and saturates more CPU till the end. The new version incresses the threads regularly after the initial drop and in the end of the test it uses 169 threads with litle CPU saturation. ### Configuration All tests are done from the same virtual machine: CentOS 7 Linux It has 12 CPUs ### Regression? This is a regression from version 6 ### Data ``` .NET 6.0 Version HTTP [guest@vm-qa-lx1 httpPerformanceTool7]$ ../httpPerformanceTool3/HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="http://worker0.os1.dev.local:31874/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.4812e+08 CPU Usage (%) 54 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 307 GC Fragmentation (%) 0.737 GC Heap Size (MB) 117 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,476,848 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 18,023,880 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 21,747,560 IL Bytes Jitted (B) 170,691 LOH Size (B) 288 Monitor Lock Contention Count (Count / 1 sec) 45 Number of Active Timers 472 Number of Assemblies Loaded 33 Number of Methods Jitted 1,783 POH (Pinned Object Heap) Size (B) 48,456 ThreadPool Completed Work Item Count (Count / 1 sec) 41,849 ThreadPool Queue Length 0 ThreadPool Thread Count 19 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 389 5000000 iterations Completed in 00:01:59.3192526. 41904.3858476197r/s ``` ``` .NET 7.0 Version HTTP [guest@vm-qa-lx1 httpPerformanceTool7]$ ./HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="http://worker0.os1.dev.local:31874/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.0666e+08 CPU Usage (%) 39.838 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 225.391 GC Fragmentation (%) 0.584 GC Heap Size (MB) 111.132 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,319,488 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 14,412,632 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 21,847,024 IL Bytes Jitted (B) 199,599 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 38 Number of Active Timers 450 Number of Assemblies Loaded 32 Number of Methods Jitted 2,118 POH (Pinned Object Heap) Size (B) 80,832 ThreadPool Completed Work Item Count (Count / 1 sec) 27,523 ThreadPool Queue Length 1 ThreadPool Thread Count 14 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 368.828 5000000 iterations Completed in 00:02:00.9352484. 41344.4389964969r/s ``` ``` .NET 6.0 Version HTTPS [guest@vm-qa-lx1 httpPerformanceTool7]$ ../httpPerformanceTool3/HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="https://portal.dev.crc-btswsp-reg.dev.local/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.4672e+08 CPU Usage (%) 62 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 478 GC Fragmentation (%) 0.481 GC Heap Size (MB) 150 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,712,312 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 20,037,000 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 42,866,080 IL Bytes Jitted (B) 282,181 LOH Size (B) 288 Monitor Lock Contention Count (Count / 1 sec) 52 Number of Active Timers 501 Number of Assemblies Loaded 40 Number of Methods Jitted 3,025 POH (Pinned Object Heap) Size (B) 97,528 ThreadPool Completed Work Item Count (Count / 1 sec) 42,727 ThreadPool Queue Length 1 ThreadPool Thread Count 17 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 614 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.4035e+08 CPU Usage (%) 59 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 478 GC Fragmentation (%) 0.481 GC Heap Size (MB) 108 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,720,440 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 20,037,000 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 42,866,080 IL Bytes Jitted (B) 282,181 LOH Size (B) 288 Monitor Lock Contention Count (Count / 1 sec) 57 Number of Active Timers 502 Number of Assemblies Loaded 40 Number of Methods Jitted 3,025 POH (Pinned Object Heap) Size (B) 97,528 ThreadPool Completed Work Item Count (Count / 1 sec) 45,658 ThreadPool Queue Length 0 ThreadPool Thread Count 17 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 614 5000000 iterations Completed in 00:02:57.9182565. 28102.793374664165r/s ``` ``` .NET 7.0 Version HTTPS [guest@vm-qa-lx1 httpPerformanceTool7]$ ./HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="https://portal.dev.crc-btswsp-reg.dev.local/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 44,752,872 CPU Usage (%) 26.2 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 637.407 GC Fragmentation (%) 0.827 GC Heap Size (MB) 252.301 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 528 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 2,459,864 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 296,792 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 2 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,202 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 3,051 ThreadPool Queue Length 1 ThreadPool Thread Count 60 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 834.593 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 54,440,528 CPU Usage (%) 26.28 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 486.412 GC Fragmentation (%) 7.712 GC Heap Size (MB) 57.309 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 3,781,192 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 10,855,928 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 296,792 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 4 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,202 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 3,702 ThreadPool Queue Length 1 ThreadPool Thread Count 72 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 688.062 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 62,450,416 CPU Usage (%) 26.567 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 278.168 GC Fragmentation (%) 2.185 GC Heap Size (MB) 113.748 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 2,135,640 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 296,792 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 11 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,202 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 4,088 ThreadPool Queue Length 1 ThreadPool Thread Count 97 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 489.685 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 71,058,984 CPU Usage (%) 32.483 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 281.641 GC Fragmentation (%) 1.01 GC Heap Size (MB) 151.011 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 1,695,464 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 23 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 5,733 ThreadPool Queue Length 1 ThreadPool Thread Count 121 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 492.118 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 69,496,240 CPU Usage (%) 29.941 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 279.937 GC Fragmentation (%) 1.01 GC Heap Size (MB) 111.566 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,749,456 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 6 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 6,371 ThreadPool Queue Length 1 ThreadPool Thread Count 140 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 487.42 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 59,787,592 CPU Usage (%) 30.154 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 285.377 GC Fragmentation (%) 1.011 GC Heap Size (MB) 152.614 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 1,754,640 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 7 Number of Active Timers 501 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 5,990 ThreadPool Queue Length 1 ThreadPool Thread Count 141 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 489.333 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 58,716,960 CPU Usage (%) 28.846 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 283.8 GC Fragmentation (%) 1.01 GC Heap Size (MB) 106.567 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,780,064 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 6 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 6,413 ThreadPool Queue Length 1 ThreadPool Thread Count 152 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 489.275 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 68,862,064 CPU Usage (%) 30.618 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 286.429 GC Fragmentation (%) 47.053 GC Heap Size (MB) 335.172 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 38,588,040 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 65,263,504 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 321,040 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 7 Number of Active Timers 146 Number of Assemblies Loaded 36 Number of Methods Jitted 3,410 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 8,504 ThreadPool Queue Length 0 ThreadPool Thread Count 169 Time spent in JIT (ms / 1 sec) 4.936 Working Set (MB) 605.18 5000000 iterations Completed in 00:04:53.6177404. 17028.94380015466r/s ``` ### Analysis I have dumped the managed stacks (dotnet-stack) for .net 7 and most of the stacks end here: System.Net.Security!Interop+Ssl.SslWrite(class Microsoft.Win32.SafeHandles.SafeSslHandle,unsigned int8&,int32,value class SslErrorCode&) System.Net.Security!Interop+Ssl.SslRead(class Microsoft.Win32.SafeHandles.SafeSslHandle,unsigned int8&,int32,value class SslErrorCode&) System.Net.Security!Interop+Ssl.BioWrite(class Microsoft.Win32.SafeHandles.SafeBioHandle,unsigned int8&,int32) System.Net.Security!Interop+Crypto.BioRead(class Microsoft.Win32.SafeHandles.SafeBioHandle,unsigned int8[],int32)
Author: agerchev
Assignees: -
Labels: `area-System.Net.Http`, `tenet-performance`, `untriaged`
Milestone: -
rzikm commented 1 year ago

The threadpool count seems suspicious. Can you share the source code for the performance tool?

agerchev commented 1 year ago

I have dumped the call stacks from the test with https and .net 7

stack1.txt

Here is the tool.

    class Program
    {
        private static ConcurrentDictionary<HttpStatusCode, MetricdData> _metricData = new ConcurrentDictionary<HttpStatusCode, MetricdData>();

        /// <summary>
        /// 
        /// </summary>
        /// <param name="numberOfUsers">Брой на потребителите</param>
        /// <param name="numberOfIterations">Брой на итерациите, които всеки потребител прави</param>
        /// <param name="url">Адрес</param>
        /// <param name="defaultRequestVersion">версия на Http протокола по подразбиране - 1.0/1.1/2.0/3.0/</param>
        /// <param name="header">Хедър, който се добавя при всяка заявка. Могат да се добавят няколко.</param>
        static void Main(int numberOfUsers = 1, int numberOfIterations = 1, string url = null, string defaultRequestVersion = null, string[] header = null)
        {
            try
            {
                Program p = new Program();
                p.RunPerformanceTest(numberOfUsers, numberOfIterations, url, string.IsNullOrEmpty(defaultRequestVersion) ? null : Version.Parse(defaultRequestVersion), header).Wait();
            }
            catch (Exception ex)
            {
                var exCurrent = ex;
                do
                {
                    Console.WriteLine(exCurrent.Message);
                }
                while ((exCurrent = exCurrent.InnerException) != null);
            }
        }

        public async Task RunPerformanceTest(int numberOfUsers, int numberOfIterations, string url, Version defaultRequestVersion, string[] header)
        {
            List<Task> pendingWork = new List<Task>();

            SocketsHttpHandler handler = new SocketsHttpHandler();

            handler.PooledConnectionLifetime = TimeSpan.FromMinutes(30);
            handler.PooledConnectionIdleTimeout = TimeSpan.FromMinutes(30);

            handler.EnableMultipleHttp2Connections = true;
            handler.MaxConnectionsPerServer = 1024; 
            handler.SslOptions.RemoteCertificateValidationCallback = (sender, certificate, chain, sslPolicyErrors) => { return true; };

            HttpClient httpClient = new HttpClient(handler);

            if (defaultRequestVersion != null)
                httpClient.DefaultRequestVersion = defaultRequestVersion;

            NameValueHeaderValue[] headers = new NameValueHeaderValue[header?.Length ?? 0];

            for (int i = 0; i < header?.Length; i++)
            {
                headers[i] = NameValueHeaderValue.Parse(header[i]);
            }

            Stopwatch stopwatch = Stopwatch.StartNew();

            for (int users = 0; users < numberOfUsers; users++)
            {
                pendingWork.Add(Task.Run(async () =>
                {
                    var buffer = new byte[32 * 1024];

                    for (var iter = 0; iter < numberOfIterations; iter++)
                    {
                        var stopwatchLocal = Stopwatch.StartNew();

                        using (var httprequest = new HttpRequestMessage(HttpMethod.Get, new Uri(url)) { Version = httpClient.DefaultRequestVersion, VersionPolicy = httpClient.DefaultVersionPolicy })
                        {
                            foreach (var header in headers)
                                httprequest.Headers.Add(header.Name, header.Value);

                            using (var response = await httpClient.SendAsync(httprequest, HttpCompletionOption.ResponseHeadersRead))
                            {
                                using (var stream = await response.Content.ReadAsStreamAsync())
                                {
                                    while ((await stream.ReadAsync(buffer, 0, buffer.Length)) > 0) ;
                                }

                                var metricData = _metricData.GetOrAdd(response.StatusCode, (statusCode) => { return new MetricdData(); });

                                Interlocked.Increment(ref metricData.NumberOfRequests);
                                Interlocked.Add(ref metricData.AvgLatency, stopwatchLocal.ElapsedMilliseconds);
                            }
                        }
                    }
                }));
            }

            var monitoringTask = StartMonitoring();

            await Task.WhenAll(pendingWork.ToArray());

            stopwatch.Stop();

            Console.WriteLine("{0} iterations Completed in {1}. {2}r/s", numberOfIterations * numberOfUsers, stopwatch.Elapsed, (numberOfIterations * numberOfUsers) / stopwatch.Elapsed.TotalSeconds);
        }

        public async Task StartMonitoring()
        {
            StringBuilder report = new StringBuilder();

            while (true)
            {
                try
                {
                    report.Clear();

                    foreach (var metric in _metricData.OrderBy((kvp) => kvp.Key))
                    {
                        var numberOfRequests = Interlocked.Exchange(ref metric.Value.NumberOfRequests, 0);
                        var currentAvgLatency = Interlocked.Exchange(ref metric.Value.AvgLatency, 0);

                        var tmp = numberOfRequests != 0 ? (double)currentAvgLatency / (double)numberOfRequests : 0;

                        report.Append($"{(int)metric.Key} - {numberOfRequests.ToString().PadLeft(6, ' ')} r/s, {tmp.ToString("F01").PadLeft(7, ' ')} ms | ");
                    }

                    report.AppendLine();

                    Console.Write(report.ToString());

                }
                catch(Exception ex)
                {
                    Console.WriteLine(ex.Message);
                }
                await Task.Delay(1000);
            }
        }
    }

    public class MetricdData
    {
        public MetricdData()
        {
            NumberOfRequests = 0;
            AvgLatency = 0;
        }

        public int NumberOfRequests;
        public long AvgLatency;
    }
wfurt commented 1 year ago

can you re-run the test with DOTNET_SYSTEM_NET_SECURITY_DISABLETLSRESUME=1? And yes, for steady load you would see most time in encrypt/decrypt e.g. the read & write functions.

cc: @kouvel for thread pool insight.

kouvel commented 1 year ago

cc: @kouvel for thread pool insight.

As far as I know not much has changed in the thread pool in .NET 7 on Linux that could affect perf in this sort of way.

agerchev commented 1 year ago

@wfurt DOTNET_SYSTEM_NET_SECURITY_DISABLETLSRESUME=1 does not change anything.

I made the test on another machine, with RHEL 9 and everything is fine. The threadPool stabilizes at 17-20 threads and runs steady as on the .net 6 version. It uses around 20 percent less CPU than the .net 6 version (almost the same as http version of the test) which is great.

As i pointed previously the machine, where i am observing the problem is CentOs 7 with OpenSSL 1.0.2k-fips 26 Jan 2017

The new one is RHEL 9 with OpenSSL 3.0.1 14 Dec 2021 (Library: OpenSSL 3.0.1 14 Dec 2021)

I agree that encrypt/decrypt are time consuming ops, but they do not saturate the CPU which is strange. The threads count increases slowly, most of the stacks sit in these methods which i suppose call almost directly openSSL and the CPU is not saturated. And this changes with .net 7 on CentOS 7 OpenSSL 1.0.2k-fips 26 Jan 2017 It looks like some sort of contention. Also, most of the thread stacks on the RHEL 9 machine are not in encrypt/decrypt.

wfurt commented 1 year ago

TLS resume was the biggest Linux change in SslStream so DOTNET_SYSTEM_NET_SECURITY_DISABLETLSRESUME was worth of shot. OpenSSL 1.0 is long out of support https://www.openssl.org/policies/releasestrat.html but I don't really see any changes in SslStream or PAL that would be impacted by the version. The Encrypt/Decrypt functions are pretty simple.

wfurt commented 1 year ago

You can probably get debug info packages for OpenSSL and lldb would show the internals. cc: @tmds for more thoughts.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/ncl, @vcsjones See info in area-owners.md if you want to be subscribed.

Issue Details
Hello, I have a tool that i use to stress our web applications over http/https. It uses httpClient with SocketsHttpHandler. When I migrated it to .net 7, I saw some performance degradation in https connections. ### Description The stress tests over http are almost the same. The .net 7 version uses fewer CPU which is great :) But the tests over https are very different. As you can see, in the .net 6 version i can achieve 28102 r/s avg, but with the newer version i can barely achieve 17028 r/s avg. The two versions (.net 6, .net 7) starts with around 220 threads (ThreadPool Thread Count) then drops. The .net 6 version drops to around 17 and saturates more CPU till the end. The new version incresses the threads regularly after the initial drop and in the end of the test it uses 169 threads with litle CPU saturation. ### Configuration All tests are done from the same virtual machine: CentOS 7 Linux It has 12 CPUs ### Regression? This is a regression from version 6 ### Data ``` .NET 6.0 Version HTTP [guest@vm-qa-lx1 httpPerformanceTool7]$ ../httpPerformanceTool3/HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="http://worker0.os1.dev.local:31874/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.4812e+08 CPU Usage (%) 54 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 307 GC Fragmentation (%) 0.737 GC Heap Size (MB) 117 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,476,848 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 18,023,880 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 21,747,560 IL Bytes Jitted (B) 170,691 LOH Size (B) 288 Monitor Lock Contention Count (Count / 1 sec) 45 Number of Active Timers 472 Number of Assemblies Loaded 33 Number of Methods Jitted 1,783 POH (Pinned Object Heap) Size (B) 48,456 ThreadPool Completed Work Item Count (Count / 1 sec) 41,849 ThreadPool Queue Length 0 ThreadPool Thread Count 19 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 389 5000000 iterations Completed in 00:01:59.3192526. 41904.3858476197r/s ``` ``` .NET 7.0 Version HTTP [guest@vm-qa-lx1 httpPerformanceTool7]$ ./HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="http://worker0.os1.dev.local:31874/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.0666e+08 CPU Usage (%) 39.838 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 225.391 GC Fragmentation (%) 0.584 GC Heap Size (MB) 111.132 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,319,488 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 14,412,632 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 21,847,024 IL Bytes Jitted (B) 199,599 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 38 Number of Active Timers 450 Number of Assemblies Loaded 32 Number of Methods Jitted 2,118 POH (Pinned Object Heap) Size (B) 80,832 ThreadPool Completed Work Item Count (Count / 1 sec) 27,523 ThreadPool Queue Length 1 ThreadPool Thread Count 14 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 368.828 5000000 iterations Completed in 00:02:00.9352484. 41344.4389964969r/s ``` ``` .NET 6.0 Version HTTPS [guest@vm-qa-lx1 httpPerformanceTool7]$ ../httpPerformanceTool3/HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="https://portal.dev.crc-btswsp-reg.dev.local/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.4672e+08 CPU Usage (%) 62 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 478 GC Fragmentation (%) 0.481 GC Heap Size (MB) 150 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,712,312 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 20,037,000 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 42,866,080 IL Bytes Jitted (B) 282,181 LOH Size (B) 288 Monitor Lock Contention Count (Count / 1 sec) 52 Number of Active Timers 501 Number of Assemblies Loaded 40 Number of Methods Jitted 3,025 POH (Pinned Object Heap) Size (B) 97,528 ThreadPool Completed Work Item Count (Count / 1 sec) 42,727 ThreadPool Queue Length 1 ThreadPool Thread Count 17 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 614 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 1.4035e+08 CPU Usage (%) 59 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 478 GC Fragmentation (%) 0.481 GC Heap Size (MB) 108 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,720,440 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 20,037,000 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 42,866,080 IL Bytes Jitted (B) 282,181 LOH Size (B) 288 Monitor Lock Contention Count (Count / 1 sec) 57 Number of Active Timers 502 Number of Assemblies Loaded 40 Number of Methods Jitted 3,025 POH (Pinned Object Heap) Size (B) 97,528 ThreadPool Completed Work Item Count (Count / 1 sec) 45,658 ThreadPool Queue Length 0 ThreadPool Thread Count 17 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 614 5000000 iterations Completed in 00:02:57.9182565. 28102.793374664165r/s ``` ``` .NET 7.0 Version HTTPS [guest@vm-qa-lx1 httpPerformanceTool7]$ ./HttpPerformnaceTool --number-of-users=500 --number-of-iterations=10000 --url="https://portal.dev.crc-btswsp-reg.dev.local/static/css/40.5f362835.chunk.css" --default-request-version="1.1" [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 44,752,872 CPU Usage (%) 26.2 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 637.407 GC Fragmentation (%) 0.827 GC Heap Size (MB) 252.301 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 528 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 2,459,864 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 296,792 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 2 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,202 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 3,051 ThreadPool Queue Length 1 ThreadPool Thread Count 60 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 834.593 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 54,440,528 CPU Usage (%) 26.28 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 486.412 GC Fragmentation (%) 7.712 GC Heap Size (MB) 57.309 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 3,781,192 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 10,855,928 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 296,792 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 4 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,202 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 3,702 ThreadPool Queue Length 1 ThreadPool Thread Count 72 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 688.062 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 62,450,416 CPU Usage (%) 26.567 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 278.168 GC Fragmentation (%) 2.185 GC Heap Size (MB) 113.748 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 2,135,640 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 296,792 LOH Size (B) 0 Monitor Lock Contention Count (Count / 1 sec) 11 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,202 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 4,088 ThreadPool Queue Length 1 ThreadPool Thread Count 97 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 489.685 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 71,058,984 CPU Usage (%) 32.483 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 281.641 GC Fragmentation (%) 1.01 GC Heap Size (MB) 151.011 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 1,695,464 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 23 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 5,733 ThreadPool Queue Length 1 ThreadPool Thread Count 121 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 492.118 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 69,496,240 CPU Usage (%) 29.941 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 279.937 GC Fragmentation (%) 1.01 GC Heap Size (MB) 111.566 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,749,456 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 6 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 6,371 ThreadPool Queue Length 1 ThreadPool Thread Count 140 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 487.42 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 59,787,592 CPU Usage (%) 30.154 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 285.377 GC Fragmentation (%) 1.011 GC Heap Size (MB) 152.614 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 1,754,640 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 7 Number of Active Timers 501 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 5,990 ThreadPool Queue Length 1 ThreadPool Thread Count 141 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 489.333 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 58,716,960 CPU Usage (%) 28.846 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 283.8 GC Fragmentation (%) 1.01 GC Heap Size (MB) 106.567 Gen 0 GC Count (Count / 1 sec) 1 Gen 0 Size (B) 1,780,064 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 15,039,568 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 311,546 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 6 Number of Active Timers 502 Number of Assemblies Loaded 36 Number of Methods Jitted 3,299 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 6,413 ThreadPool Queue Length 1 ThreadPool Thread Count 152 Time spent in JIT (ms / 1 sec) 0 Working Set (MB) 489.275 [System.Runtime] % Time in GC since last GC (%) 0 Allocation Rate (B / 1 sec) 68,862,064 CPU Usage (%) 30.618 Exception Count (Count / 1 sec) 0 GC Committed Bytes (MB) 286.429 GC Fragmentation (%) 47.053 GC Heap Size (MB) 335.172 Gen 0 GC Count (Count / 1 sec) 0 Gen 0 Size (B) 38,588,040 Gen 1 GC Count (Count / 1 sec) 0 Gen 1 Size (B) 65,263,504 Gen 2 GC Count (Count / 1 sec) 0 Gen 2 Size (B) 40,646,928 IL Bytes Jitted (B) 321,040 LOH Size (B) 319,520 Monitor Lock Contention Count (Count / 1 sec) 7 Number of Active Timers 146 Number of Assemblies Loaded 36 Number of Methods Jitted 3,410 POH (Pinned Object Heap) Size (B) 97,240 ThreadPool Completed Work Item Count (Count / 1 sec) 8,504 ThreadPool Queue Length 0 ThreadPool Thread Count 169 Time spent in JIT (ms / 1 sec) 4.936 Working Set (MB) 605.18 5000000 iterations Completed in 00:04:53.6177404. 17028.94380015466r/s ``` ### Analysis I have dumped the managed stacks (dotnet-stack) for .net 7 and most of the stacks end here: System.Net.Security!Interop+Ssl.SslWrite(class Microsoft.Win32.SafeHandles.SafeSslHandle,unsigned int8&,int32,value class SslErrorCode&) System.Net.Security!Interop+Ssl.SslRead(class Microsoft.Win32.SafeHandles.SafeSslHandle,unsigned int8&,int32,value class SslErrorCode&) System.Net.Security!Interop+Ssl.BioWrite(class Microsoft.Win32.SafeHandles.SafeBioHandle,unsigned int8&,int32) System.Net.Security!Interop+Crypto.BioRead(class Microsoft.Win32.SafeHandles.SafeBioHandle,unsigned int8[],int32)
Author: agerchev
Assignees: -
Labels: `area-System.Net.Security`, `tenet-performance`, `needs-further-triage`
Milestone: -
wfurt commented 1 year ago

the code fragment does not build because of missing dependencies and it also seems to depend on external service. Can you craft something that can be just standalone repro @agerchev? Or check if you can demonstrate it with https://github.com/dotnet/performance tests?

agerchev commented 1 year ago

@wfurt yes, i use System.CommandLine.DragonFruit, i forgot to mention, sorry.

I have made some tests, unfortunately i cannot simulate the whole setup with the external service in a standalone env.

Our setup is this: client (httpperformanceTool) -> haproxy (rhel9 tcp mode load balancer) -> 3 instances of nginx ingres controller (OpenShift) Everything is deployed on virtual machines (vmware) on premise. If I move the lb (haproxy) in another network or if i use only one ingress controller the problem does not appear. May be connected with our network environment somehow.

I have made a couple of screenshots on gdb and i hope this can be of a help. There are lots of threads with this call stack.

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f96b52b3e1b in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007f96b52b3ce8 in __GI___pthread_mutex_lock (mutex=0x7f5514019918) at ../nptl/pthread_mutex_lock.c:79
#3  0x00007f55118e5713 in int_thread_get (create=0) at err.c:452
#4  0x00007f55118e5c1d in int_thread_get_item (d=0x7f5461ff7460) at err.c:495
#5  0x00007f55118e65b0 in ERR_get_state () at err.c:1026
#6  0x00007f55118e67f5 in ERR_clear_error () at err.c:743
#7  0x00007f553006eb40 in CryptoNative_SslWrite () from /home/atanas/httpPerformanceTool7/libSystem.Security.Cryptography.Native.OpenSsl.so

As you can see the thread count is over 280.

Screenshot 2023-01-26 114541

Screenshot 2023-01-26 114711

Screenshot 2023-01-26 114737

Screenshot 2023-01-26 114822

Screenshot 2023-01-26 115412

wfurt commented 1 year ago

I don't think we would be able to for it out without good repro. Furthermore, CentOS 7 will no longer be supported. OpenSSL 1.0 is way too old. On the other hand 3.0 has it's own new issues: https://github.com/openssl/openssl/issues/20286

To make tis actionable I think we would need repro or perf test with Benchmark.NET on os supported by 8.0. I don't think we have resources to dig into otherwise.