Closed AndyAyersMS closed 1 year ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak See info in area-owners.md if you want to be subscribed.
Author: | AndyAyersMS |
---|---|
Assignees: | AndyAyersMS |
Labels: | `area-CodeGen-coreclr` |
Milestone: | - |
@AndyAyersMS I think that CastHelpers
should not be instrumented since they were already carefully optimized for a general case by @VSadov e.g. https://github.com/dotnet/runtime/blob/e6226e675a63b5d9e41f43492a13556fd4053b25/src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs#L277-L322
PS: we can probably propose a new API [MethodImpl(MethodImplOptions.NoInstrumenting)]
Yes we are certain that these methods will be hot and want the best possible codegen right away.
The cast helpers are also initialized in a special way - https://github.com/dotnet/runtime/blob/4cbe6f99d23e04c56a89251d49de1b0f14000427/src/coreclr/vm/ecall.cpp#L99
I do not remember all the details, but there were reasons for that.
Maybe it is possible to disable instrumentation for things initialized that way.
(I do not have preferences, just want mention that there could be another way to detect and special case these)
Yes we are certain that these methods will be hot and want the best possible codegen right away.
The cast helpers are also initialized in a special way -
I do not remember all the details, but there were reasons for that. Maybe it is possible to disable instrumentation for things initialized that way. (I do not have preferences, just want mention that there could be another way to detect and special case these)
The ones you listed have AggressiveOptimization attribute, am I right? In that case we won't instrument them, but there are other (less important) members in CastHelpers which don't have AggressiveOptimization so are eligble for instrumentation
All the cast helpers are initialized in that method.
If you have other uses and reasons to introduce MethodImplOptions.NoInstrumenting
, then this part is unimportant.
It would only be interesting if you somehow do not want MethodImplOptions.NoInstrumenting
just for a few special methods and need another way to identify them.
The issue caused by https://github.com/dotnet/runtime/pull/80481 is with benchmark methods like PerfLabTests.CastingPerf:CheckArrayIsInterfaceNo
that trigger OSR by looping a lot internally.
For such methods we never see any Tier0 + instr method return, so with sparse profiles enabled Tier1 profile reconstruction thinks the entry block is run rarely, and this messes up optimizations. eg in this case we fail to hoist/cse the cast out of the loop.
Trees after Profile incorporation
-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC lp [IL range] [jump] [EH region] [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 0 0 [000..006)-> BB03 (always) rare IBC
BB02 [0001] 1 BB03 5630k 5630382 [006..018) bwd bwd-target IBC
BB03 [0002] 2 BB01,BB02 5630k 5630382 [018..020)-> BB02 ( cond ) bwd bwd-src IBC
BB04 [0003] 1 BB03 0 0 [020..022) (return) rare IBC
-----------------------------------------------------------------------------------------------------------------------------------------
The fix I'm proposing is to run synthesis repair in such cases. It addresses the above case, looking now to see what else it does.
Also noticed #84319 while reviewing the dumps for CheckArrayIsInterfaceNo
.
After #84312
Updated table of worst-case results (just 1 day of data, so possibly noisy, eg PerfLabTests.CastingPerf.CheckArrayIsInterfaceNo
is bimodal but hasn't shown fast results for PGO yet).
[now showing the "fast time"]
Investigated cases in bold, notes linked -- skipping tests in italics that look similar to ones already analyzed.
TestName | Pgo/NoPgo | Note |
---|---|---|
System.Text.Json.Tests.Perf_Get.GetByte [notes] | 0.378284924 | budget #85531 |
System.Text.Json.Tests.Perf_Get.GetUInt16 | 0.39703163 | dup |
System.Text.Json.Tests.Perf_Get.GetInt16 | 0.415271098 | dup |
System.Text.Json.Tests.Perf_Get.GetSByte | 0.432559076 | dup |
System.Text.Json.Tests.Perf_Get.GetUInt32 | 0.485751995 | dup |
System.Text.Json.Tests.Perf_Get.GetInt32 | 0.496368517 | dup |
PerfLabTests.CastingPerf.CheckArrayIsInterfaceYes | 0.500523372 | dup |
PerfLabTests.CastingPerf.CheckArrayIsInterfaceNo [notes] improvements | 0.500571243 | fixed https://github.com/dotnet/runtime/pull/84312 |
System.Collections.IterateFor<String>.ImmutableArray(Size: 512) [notes] | 0.521820448 | suspect noise |
Exceptions.Handling.MultipleNestedTryCatch_FirstCatches(kind: Hardware) [notes] | 0.587834042 | noise |
Exceptions.Handling.ThrowAndCatchWhenFinally(kind: Software) | 0.629162936 | dup |
System.Collections.Sort<Int32>.Array(Size: 512) [notes] | 0.691174178 | layout |
Exceptions.Handling.ThrowAndCatch(kind: ReflectionSoftware) | 0.712891743 | dup |
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToWriter(Mode: SourceGen) [notes] | 0.714508317 | budget #85531 |
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToUtf8Bytes(Mode: SourceGen) | 0.716100799 | dup |
System.Collections.ContainsKeyTrue<Int32, Int32>.ConcurrentDictionary(Size: 512) [notes] | 0.716735601 | deferred |
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToString(Mode: SourceGen) | 0.72480144 | dup |
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeToStream(Mode: SourceGen) | 0.728309251 | dup |
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, String>>.SerializeObjectProperty(Mode: SourceGen) | 0.732178172 | dup |
Exceptions.Handling.ThrowAndCatch_ManyCatchBlocks(kind: ReflectionSoftware) | 0.734050863 | dup |
Benchstone.BenchI.MulMatrix.Test [notes] | 0.746493091 | repair https://github.com/dotnet/runtime/pull/84741 |
LinqBenchmarks.Count00ForX [notes] | 0.75390536 | #84517 |
Benchstone.MDBenchI.MDLogicArray.Test [notes] | 0.755000226 | repair |
Span.Sorting.QuickSortSpan(Size: 512) [notes] | 0.755771373 | layout |
System.IO.Tests.Perf_Path.GetDirectoryName [notes] | 0.762018237 | unclear |
System.Collections.CtorFromCollection<Int32>.SortedSet(Size: 512) [notes] | 0.784138758 | noise |
MicroBenchmarks.Serializers.XmlToStream<MyEventsListerViewModel>.XmlSerializer [notes] | 0.785627854 | noise |
System.Text.Json.Tests.Perf_Segment.ReadMultiSegmentSequence(segmentSize: 4096, TestCase: Json4KB) [notes] | 0.788119346 | budget #85531 |
System.Text.Json.Tests.Perf_Get.GetUInt64 | 0.791121821 | dup |
System.Memory.Span<Byte>.SequenceEqual(Size: 33) [notes] | 0.791880016 | intrinsic https://github.com/dotnet/runtime/pull/85130 |
System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, IgnoreNonSpace)) [notes] | 0.798755609 | noise |
Benchstone.MDBenchF.MDInvMt.Test [notes] | 0.80080933 | Fixed https://github.com/dotnet/runtime/pull/84741 |
PerfLabTests.CastingPerf.ObjObjIsFoo [notes] | 0.806031322 | noise |
System.Memory.Span<Byte>.StartsWith(Size: 512) [notes] | 0.806253739 | noise |
Exceptions.Handling.TryAndFinallyDeep(kind: Software) | 0.81026551 | dup |
System.Collections.ContainsFalse<String>.ImmutableArray(Size: 512) [notes] | 0.812478704 | noise |
System.Memory.Span<Byte>.SequenceEqual(Size: 512) | 0.821172947 | dup |
Benchstone.BenchF.InvMt.Test [notes] | 0.822482883 | fixed https://github.com/dotnet/runtime/pull/84741 |
ByteMark.BenchLUDecomp [notes] | 0.822880993 | blend #85171 |
PerfLabTests.CastingPerf2.CastingPerf.ScalarValueTypeObj [notes] | 0.82354731 | fixed |
System.Memory.Span<Byte>.SequenceCompareToDifferent(Size: 4) [notes] | 0.834542277 | LSRA block order |
PerfLabTests.LowLevelPerf.GenericGenericMethod [notes] | 0.836391889 | noise |
System.Text.Json.Document.Tests.Perf_DocumentParse.Parse(IsDataIndented: False, TestRandomAccess: True, TestCase: Json400KB) [notes] | 0.836471477 | unclear |
System.Perf_Convert.FromBase64String [notes] | 0.839094252 | maintenance #85265 |
System.Text.RegularExpressions.Tests.Perf_Regex_Cache.IsMatch(total: 400000, unique: 1, cacheSize: 15) [notes] | 0.841496573 | noise |
Benchstone.BenchI.CSieve.Test [link] | 0.842229476 | Fixed https://github.com/dotnet/runtime/pull/84741 |
System.Collections.IndexerSet<Int32>.Dictionary(Size: 512) [notes] | 0.842651326 | noise ? |
System.Text.Json.Serialization.Tests.WriteJson<ImmutableSortedDictionary<String, String>>.SerializeToWriter(Mode: SourceGen) | 0.84525902 | dup |
System.Tests.Perf_Char.Char_ToLowerInvariant(input: "Hello World!") [notes] | 0.846315396 | noise |
System.Text.Json.Tests.Perf_Get.GetInt64 | 0.847824295 | dup |
Look into cases where microbenchmarks are slower with PGO enabled.
https://github.com/dotnet/runtime/issues/74873#issuecomment-1492803245 has an initial list to check.
Issues uncovered
85531
85548
Fixes