.NET 6.0 Microbenchmarks Performance Study Report

The goal of this study was simple: ensure that we ship .NET 6 without any performance regressions, especially in the configs not covered by .NET Performance Lab.

We have not changed the methodology since last year, so if you are interested in details about methodology please read https://github.com/dotnet/runtime/issues/41871.

Data

This year, we have covered more configs than ever! Namely:

architectures: x64, x86, arm64, arm
Unix: Alpine 3.13, CentOS 8, Debian 10, RHEL 7, SLES 15, OpenSUSE 15.3, Ubuntu 16.04, Ubuntu 18.04, macOS 11.4 and 11.5.2
Windows: 7, 8.1, 10, 11, Server 2022, Server 2022 Core

| Operating System | Bit | Processor Name | Comment | | ----------------------- | ----- | ----------------------------------------------- |------------------------------| | Windows 10.0.19043.1165 | X64 | AMD Ryzen Threadripper PRO 3945WX 12-Cores | | | Windows 10.0.20348 | X64 | AMD EPYC 7452 | Windows Server 2022, VM | | Windows 10.0.20348 | X64 | AMD EPYC 7452 | Windows Server 2022 Core, VM | | Windows 10.0.18363.1621 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | | | Windows 8.1 | X64 | Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge) | | | Windows 10.0.19042.685 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | | | Windows 10.0.19043.1165 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) | | | Windows 10.0.22454 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) | | | Windows 10.0.22451 | X64 | Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) | | | Windows 10.0.19042.1165 | X64 | Intel Core i9-9900T CPU 2.10GHz | | | Windows 7 SP1 | X64 | Intel Core2 Duo CPU T9600 2.80GHz | ancient hardware | | centos 8 | X64 | AMD EPYC 7452 | VM | | debian 10 | X64 | AMD EPYC 7452 | VM | | rhel 7 | X64 | AMD EPYC 7452 | VM | | sles 15 | X64 | AMD EPYC 7452 | VM | | opensuse-leap 15.3 | X64 | AMD EPYC 7452 | VM | | ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | | | ubuntu 18.04 | X64 | Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge) | | | alpine 3.13 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | | | ubuntu 16.04 | Arm64 | Qualcomm Centriq | | | Windows 10.0.19043.1165 | Arm64 | Microsoft SQ1 3.0 GHz | | | Windows 10.0.22000 | Arm64 | Microsoft SQ1 3.0 GHz | | | Windows 10.0.19043.1165 | X86 | AMD Ryzen Threadripper PRO 3945WX 12-Cores | | | Windows 10.0.18363.1621 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz | | | Windows 10.0.19043.1165 | Arm | Microsoft SQ1 3.0 GHz | | | macOS Big Sur 11.5.2 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) | | | macOS Big Sur 11.5.2 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) | | | macOS Big Sur 11.4 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | |

Most of the benchmarks were run on bare-metal machines, but some were executed on Azure VMs.

This would not be possible without the help from: @AndyAyersMS @BruceForstall @bwadswor @carlossanlop @danmoseley @jeffhandley @michaelgsharp @sharwell @smitpatel @vatsan-madhavan @wfurt who contributed their results and time.

Everyone interested can download the data from here and here (GitHub does not support files larger than 100 MB so I had to split .NET 5 and 6 results into two separate archives). The full report generated by the tool is available here. The full report contains also improvements, so if you read it from the end you can see the biggest perf improvements.

Moreover, the full historical data which again turned out to be extremely useful is available here.

Regressions

By design

[x] System.Memory.Span<Byte>.IndexOfAnyFourValues(Size: 512), System.Memory.Span<Int32>.IndexOfAnyFourValues(Size: 512):
- reported: https://github.com/dotnet/runtime/issues/54172
- explained: https://github.com/dotnet/runtime/issues/54172#issuecomment-860992447
[x] System.Linq.Tests.Perf_Enumerable.TakeLastHalf(input: List)
- reported: https://github.com/dotnet/runtime/issues/50006
- explained: https://github.com/dotnet/runtime/issues/50006#issuecomment-815661898
[x] System.Collections.Concurrent.AddRemoveFromSameThreads*
- reported: https://github.com/dotnet/runtime/issues/46714
- explained: https://github.com/dotnet/runtime/issues/46714#issuecomment-859144213
[x] System.Tests.Perf_Random.NextDouble, System.Tests.Perf_Random.Next_int, System.Tests.Perf_Random.Next_int_int, System.Tests.Perf_Random.NextBytes_span
- reported: https://github.com/dotnet/runtime/issues/47870
- explained: https://github.com/dotnet/runtime/issues/47870#issuecomment-773721010
[x] System.IO.Tests.BinaryWriterExtendedTests.WriteAsciiCharArray(StringLengthInChars: 32)
- the benchmark has slightly regressed for small inputs (10%), but greatly improved for large inputs (up to few times faster)
[x] System.IO.Tests.Perf_FileStream.ReadAsync(fileSize: 1024, userBufferSize: 1024, options: Asynchronous)
- it's the cost of FileStream being 100% async now

Investigation in progress

[ ] System.Numerics.Tests.Perf_Matrix3x2.IsIdentityBenchmark
- reported: https://github.com/dotnet/runtime/issues/50939
- asked for help: https://github.com/dotnet/runtime/issues/50939#issuecomment-877138189
- reopened: https://github.com/dotnet/runtime/issues/50939#issuecomment-919141327
[x] System.Numerics.Tests.Perf_Vector3.DistanceBenchmark, System.Numerics.Tests.Perf_Vector2.DistanceBenchmark
- detected: https://github.com/DrewScoggins/performance-2/issues/1848 but not reported in runtime repo
- added to similar existing issue: https://github.com/dotnet/runtime/issues/50939#issuecomment-919159165
[ ] System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase)), System.Globalization.Tests.StringEquality.Compare_DifferentFirstChar(Count: 1024, Options: (en-US, Ordinal)), System.Buffers.Text.Tests.Utf8FormatterTests.FormatterInt64(value: 12345), System.Tests.Perf_Int32.ToStringHex(value: 2147483647), System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, Ordinal))
- detected: in https://github.com/DrewScoggins/performance-2/issues/4549 but not reported in runtime repo
- opened: https://github.com/dotnet/runtime/issues/59087, most likely caused by PGO
[ ] System.Collections.ContainsKeyFalse<Int32, Int32>.SortedList(Size: 512)
- reported: https://github.com/dotnet/runtime/issues/51258
- added a comment with what I've found: https://github.com/dotnet/runtime/issues/51258#issuecomment-919331459
[ ] System.Collections.ContainsKeyFalse<Int32, Int32>.ConcurrentDictionary(Size: 512)
- detected: https://github.com/DrewScoggins/performance-2/issues/7188 but not reported in runtime repo
- opened: https://github.com/dotnet/runtime/issues/59101
[ ] System.Text.Json.Serialization.Tests.ReadJson<Int32>.DeserializeFromStream
- seems to not be detected by the bot
- opened: https://github.com/dotnet/runtime/issues/59103
[ ] System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000), System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000) and System.Text.Tests.Perf_StringBuilder.ctor_string(length: 100000)
- detected (with a monthly delay): https://github.com/dotnet/perf-autofiling-issues/issues/304 but not reported in runtime repo
- opened: https://github.com/dotnet/runtime/issues/59142
[x] System.Collections.CtorDefaultSize<Int32>.ConcurrentBag
- seems to affect only Linux VMs (bare metal is fine)
- opened: https://github.com/dotnet/runtime/issues/59145
[ ] System.Text.Perf_Utf8Encoding.GetBytes(Input: Cyrillic)
- detected: https://github.com/dotnet/runtime/issues/52313
[ ] System.Collections.Sort<IntClass>.Array(Size: 512)
- detected: https://github.com/DrewScoggins/performance-2/issues/6054 but not reported in runtime repo
- opened: https://github.com/dotnet/runtime/issues/59149
[ ] System.Threading.Tests.Perf_Timer.ShortScheduleAndDisposeWithFiringTimers
- opened: https://github.com/dotnet/runtime/issues/59150, macOS-specific
[ ] PerfLabTests.DelegatePerf.DelegateInvoke
- openedhttps://github.com/dotnet/runtime/issues/59152, macOS-specific
[ ] Microsoft.Extensions.Logging.ScopesOverheadBenchmark.FilteredByLevel_InsideScope(HasISupportLoggingScopeLogger: False, CaptureScopes: True)
- detected: https://github.com/DrewScoggins/performance-2/issues/3245 but not reported in runtime repo
- https://github.com/dotnet/runtime/issues/59267

None of the regressions reported above is critical, but in my opinion, we should have a good understanding of https://github.com/dotnet/runtime/issues/59145 before we ship .NET 6.

Noise, flaky or multimodal

The following benchmarks showed up in the report generated by the tool, but were not actual regressions:

System.Collections.CopyTo<Int32>*
System.Net.Primitives.Tests.IPAddressPerformanceTests.Ctor_Span(address: [16, 32, 48, 64, 80, ...])
System.Buffers.Tests.ReadOnlySequenceTests<Byte>.SliceTenSegments
noisy for few configs, very stable and even improving for others: System.Numerics.Tests.Perf_Matrix4x4.CreateReflectionBenchmark
System.Memory.Span<Int32>.EndsWith(Size: 512)
Most likely heavily dependent on memory alignment System.Memory.Span<Int32>.BinarySearch(Size: 512) (https://github.com/dotnet/runtime/issues/56402)
PerfLabTests.CastingPerf.CheckArrayIsVariantGenericInterfaceNo
System.Memory.ReadOnlySpan.IndexOfString(input: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", value: "x", comparisonType: InvariantCultureIgnoreCase)
System.Net.Security.Tests.SslStreamTests.ConcurrentReadWrite
PerfLabTests.DelegatePerf.MulticastDelegateInvoke(length: 1000)
System.Numerics.Tests.Perf_BitOperations.PopCount_uint, System.Numerics.Tests.Perf_BitOperations.LeadingZeroCount_uint: memory alignment

Big thanks to everyone involved!

dotnet / runtime