Closed performanceautofiler[bot] closed 1 year ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
https://github.com/dotnet/runtime/pull/76803 @stephentoub
There are other regressions/improvements in the linked issues.
76803 @stephentoub
That PR is adding new methods, not updating existing ones.
More likely is https://github.com/dotnet/runtime/pull/75754, which changes existing IndexOfAny methods. cc: @adamsitnik
More likely is https://github.com/dotnet/runtime/pull/75754, which changes existing IndexOfAny methods.
Ah sorry, I missed that.
https://github.com/dotnet/perf-autofiling-issues/issues/9392 is interesting one too:
It does appear to be #75754 (3e40074e89015f49e039b2fd2b06989c689ddb8e)
diff --git a/src/benchmarks/micro/libraries/System.Runtime/Perf.String.cs b/src/benchmarks/micro/libraries/System.Runtime/Perf.String.cs
index 4a4646e7..08419463 100644
--- a/src/benchmarks/micro/libraries/System.Runtime/Perf.String.cs
+++ b/src/benchmarks/micro/libraries/System.Runtime/Perf.String.cs
@@ -5,11 +5,14 @@
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
+using BenchmarkDotNet.Diagnostics.Windows.Configs;
using MicroBenchmarks;
namespace System.Tests
{
[BenchmarkCategory(Categories.Runtime, Categories.Libraries)]
+ [InliningDiagnoser(false, true)]
public class Perf_String
{
// the culture-specific methods are tested in Perf_StringCultureSpecific class
cd c:\src\runtime\
git worktree add ../runtime-3e40074 3e40074
git worktree add ../runtime-e4471c1 3e40074~
cd ..\runtime-3e40074\
.\build.cmd -c Release
cd ..\runtime-e4471c1\
.\build.cmd -c Release
cd ../
py -3.11 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_String.IndexOf*' --corerun C:\src\runtime-3e40074\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe C:\src\runtime-e4471c1\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe
BenchmarkDotNet=v0.13.2.1950-nightly, OS=Windows 11 (10.0.22621.674)
AMD Ryzen 7 5700U with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-alpha.1.22554.7
[Host] : .NET 8.0.0 (8.0.22.55109), X64 RyuJIT AVX2
Job-DOWZQX : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-YWAYCP : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
Method | Job | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | Code Size | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
IndexOfAny | Job-DOWZQX | \runtime-3e40074[..] | 8.467 ns | 0.1468 ns | 0.1373 ns | 8.489 ns | 8.198 ns | 8.657 ns | 1.00 | 380 B | - | NA |
IndexOfAny | Job-YWAYCP | \runtime-e4471c1[..] | 5.447 ns | 0.0424 ns | 0.0396 ns | 5.439 ns | 5.395 ns | 5.516 ns | 0.64 | 2,325 B | - | NA |
// * Diagnostic Output - InliningDiagnoser *
--------------------
--------------------
Perf_String.IndexOfAny: Job-DOWZQX(PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\runtime-3e40074\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe, IterationTime=250.0000 ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.String.IndexOfAny - instance int32 (wchar[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ThrowHelper.ThrowArgumentNullException - void (value class System.ExceptionArgument)
Fail Reason: does not return
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char]..ctor - instance void (!0&,int32)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].op_Implicit - value class System.ReadOnlySpan`1<!0> (!0[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char]..ctor - instance void (!0[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.Runtime.InteropServices.MemoryMarshal.GetArrayDataReference - generic !!0& (!!0[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.MemoryExtensions.IndexOfAny - generic int32 (value class System.ReadOnlySpan`1<!!0>,value class System.ReadOnlySpan`1<!!0>)
Fail Reason: inline exceeds budget
--------------------
--------------------
Perf_String.IndexOfAny: Job-YWAYCP(PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\runtime-e4471c1\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe, IterationTime=250.0000 ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.String.IndexOfAny - instance int32 (wchar[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ThrowHelper.ThrowArgumentNullException - void (value class System.ExceptionArgument)
Fail Reason: does not return
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char]..ctor - instance void (!0&,int32)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].op_Implicit - value class System.ReadOnlySpan`1<!0> (!0[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char]..ctor - instance void (!0[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.Runtime.InteropServices.MemoryMarshal.GetArrayDataReference - generic !!0& (!!0[])
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.MemoryExtensions.IndexOfAny - generic int32 (value class System.ReadOnlySpan`1<!!0>,value class System.ReadOnlySpan`1<!!0>)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.Runtime.CompilerServices.RuntimeHelpers.IsBitwiseEquatable - generic bool ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.Runtime.InteropServices.MemoryMarshal.GetReference - generic !!0& (value class System.ReadOnlySpan`1<!!0>)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.Runtime.InteropServices.MemoryMarshal.GetReference - generic !!0& (value class System.ReadOnlySpan`1<!!0>)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.SpanHelpers.IndexOfValueType - generic int32 (!!0&,!!0,int32)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.SpanHelpers.IndexOfAnyValueType - generic int32 (!!0&,!!0,!!0,int32)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.SpanHelpers.IndexOfAnyValueType - generic int32 (!!0&,!!0,!!0,!!0,int32)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.SpanHelpers.IndexOfAnyValueType - generic int32 (!!0&,!!0,!!0,!!0,!!0,int32)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.MemoryExtensions.IndexOfAnyProbabilistic - int32 (wchar&,int32,wchar&,int32)
Fail Reason: unprofitable inline
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.Runtime.InteropServices.MemoryMarshal.GetReference - generic !!0& (value class System.ReadOnlySpan`1<!!0>)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.Runtime.InteropServices.MemoryMarshal.GetReference - generic !!0& (value class System.ReadOnlySpan`1<!!0>)
--------------------
Inliner: System.Tests.Perf_String.IndexOfAny - instance int32 ()
Inlinee: System.ReadOnlySpan`1[System.Char].get_Length - instance int32 ()
--------------------
Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.
Author: | performanceautofiler[bot] |
---|---|
Assignees: | - |
Labels: | `area-System.Memory`, `tenet-performance-benchmarks` |
Milestone: | - |
The change at https://github.com/Rob-Hague/runtime/tree/indexofany-ros appears to solve the regression. It is not super pretty given the work on deduplication with generics. I did play with generics ala
but I was just adding complexity/indirection when ultimately it needs to become "un-generic" at some point (the complication here is the extra 4- and 5-value specialisations which only exist as helpers in .SpanHelpers
, along with the different fallbacks per type)
Edit: the tricky bit is that there is no non-value type helpers for 4- and 5- value "between" the 3- value:
and the (ref, count, ref, count)
helper:
so when you have a pattern like this:
case 5:
// private helper in MemoryExtensions
return IndexOfAny(span, values[0], values[1], values[2], values[3], values[4]);
the fallback in the private helper after an IsBitwiseEquatable
branch is not so obvious. So I was unsure how to proceed there. I don't doubt there is a nicer solution ("I've tried nothing and I'm all out of ideas!") but I can open a PR with these changes if they look OK.
Wall of benchmarks below. I added "LastIndex.." variants to the benchmark project locally. The comparisons are:
Toolchain | Description |
---|---|
\runtime-e4471c1\ | e4471c1 as baseline (the parent of #75754 = 3e40074) |
\runtime-3e40074\ | #75754 = 3e40074 |
\runtime\ | main = d3ab95d3be895a1950a46c559397780dbb3e9807 (arbitrarily) |
\indexofany-ros\ | The indexofany-ros branch on my fork linked above. On the same main commit |
\no-inline-attr | The same indexofany-ros branch but with [AggressiveInlining] not added to the extra methods. Its favourability swings around a bit but it is probably beneficial to have the annotation on balance (purely from a time perspective) |
py -3.11 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_String.*IndexOf*' 'System.Memory.Span<*>.*IndexOfAnyF*' --corerun C:\src\runtime-e4471c1\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe C:\src\runtime-3e40074\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe C:\src\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe C:\src\indexofany-ros\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe C:\src\no-inline-attr\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe
We're currently sitting at 8ns and this is nearly a year old now. I don't think there is really anything actionable for us to do here at this point.
Run Information
Regressions in System.Tests.Perf_String
Test Report
Repro
Related Issues
Regressions
Improvements