Open performanceautofiler[bot] opened 1 year ago
Windows x64 Intel and AMD both are stable-regressed. Transfering.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | performanceautofiler[bot] |
---|---|
Assignees: | - |
Labels: | `os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `PGO`, `needs-area-label` |
Milestone: | - |
@AndyAyersMS Can you investigate?
Looks like a fairly broad regression (Aug-28)
Still seems regressed overall... lots of fluctuations but higher than it was last year
Commit range is https://github.com/dotnet/runtime/compare/f5881e59f1e625b2f27d1553dc572c5e9b66dbe4...a7d4b4b4761b68c85e27d9c4c8dacb86c285ddc6
Seems like a static PGO update lead to this one.
At least locally this improved and then re-regressed
Method | Runtime | Options | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LastIndexOf_Word_NotFound | .NET 7.0 | (en-US, OrdinalIgnoreCase, False) | 624.0 ns | 20.22 ns | 23.28 ns | 616.5 ns | 599.5 ns | 691.3 ns | 1.00 | 0.05 | - | NA |
LastIndexOf_Word_NotFound | .NET 8.0 | (en-US, OrdinalIgnoreCase, False) | 781.5 ns | 33.94 ns | 39.08 ns | 774.1 ns | 732.8 ns | 894.0 ns | 1.25 | 0.08 | - | NA |
LastIndexOf_Word_NotFound | .NET 9.0p5 | (en-US, OrdinalIgnoreCase, False) | 616.5 ns | 37.83 ns | 43.56 ns | 596.0 ns | 581.7 ns | 720.6 ns | 0.99 | 0.08 | - | NA |
LastIndexOf_Word_NotFound | .NET 9.0p6 | (en-US, OrdinalIgnoreCase, False) | 725.0 ns | 5.01 ns | 4.69 ns | 725.0 ns | 717.9 ns | 735.7 ns | 1.23 | 0.02 | - | NA |
7.0
Benchmark: found 20 intervals; mean interval 220.592ms
00.02% 1E+04 ? Unknown
95.48% 4.25E+07 Tier-1 [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
01.03% 4.6E+05 Tier-1 [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)
00.85% 3.8E+05 Tier-1 [System.Private.CoreLib]Ordinal.LastIndexOfOrdinalIgnoreCase(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.65% 2.9E+05 Tier-1 [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.61% 2.7E+05 Tier-1 [5c35a557-7919-4c35-a009-c8e8c4d39c9c]Runnable_0.WorkloadActionUnroll(int64)
9.0 preview 6
Benchmark: found 20 intervals; mean interval 296.440ms
96.75% 5.811E+07 Tier-1 [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.80% 4.8E+05 Tier-1 [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.73% 4.4E+05 native clrjit.dll
00.70% 4.2E+05 native coreclr.dll
00.35% 2.1E+05 Tier-1 [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)
Need to do more digging but certainly looks like a layout issue. Raw sample attribution
NET 7
Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8ACAFF460 -- 0x00007FF8ACAFF783 (length 0x0323)
0x0000 : 1
0x0008 : 3
0x000C : 1
0x0017 : 2
0x0025 : 3
0x002F : 1
0x0037 : 2
0x003C : 1
0x0043 : 1
0x004C : 417
0x005B : 11
0x005F : 387
0x0062 : 1
0x0068 : 2
0x0079 : 8
0x007D : 432
0x0086 : 4
0x0089 : 1
0x008C : 5
0x008E : 395
0x0098 : 7
0x009B : 12
0x00A5 : 761
0x00A8 : 42
0x00AE : 1
0x0107 : 1
0x010B : 367
0x010E : 47
0x0111 : 3
0x0113 : 2
0x011D : 390
0x0120 : 50
0x012A : 352
0x0133 : 4
0x0188 : 43
0x0191 : 23
0x0195 : 17
0x02BC : 3
0x02C5 : 407
0x02D7 : 7
0x02DC : 15
0x02E1 : 15
0x02E6 : 2
0x02E8 : 1
NET 9
Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8CB941640 -- 0x00007FF8CB941BA1 (length 0x0561)
0x0009 : 9
0x000A : 1
0x000C : 3
0x0016 : 3
0x001E : 1
0x0022 : 1
0x004A : 1
0x005E : 1
0x0061 : 1
0x006C : 1
0x008B : 399
0x0097 : 5
0x009C : 8
0x00A2 : 409
0x00AE : 3
0x00B3 : 1
0x00BB : 2
0x00BE : 368
0x00C1 : 1
0x00C9 : 4
0x01DC : 2
0x01E1 : 14
0x01E5 : 22
0x01E8 : 390
0x01EE : 10
0x0305 : 3
0x030E : 418
0x0329 : 10
0x0331 : 14
0x0333 : 1
0x0335 : 2
0x033B : 5
0x034C : 419
0x0359 : 12
0x035C : 1
0x0367 : 387
0x036F : 36
0x0372 : 6
0x0375 : 446
0x0380 : 2
0x038A : 397
0x038D : 1
0x0391 : 31
0x039E : 385
0x03A1 : 29
0x03A7 : 1529
0x03AC : 7
0x03B0 : 10
Note in particular the heavy sample count at the end of the .NET 9 code -- seems like a hot loop has gotten widely spread out.
This is an R2R'd method so went through Tier-1 instr and some some of its callees either will lack PGO or will have static PGO.
If we disable R2R we see 9.0 do pretty well..
OutlierMode=DontRemove EnvironmentVariables=DOTNET_ReadyToRun=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1
Method | Job | Toolchain | Options | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LastIndexOf_Word_NotFound | Job-GPYIFC | .NET 7 | (en-US, OrdinalIgnoreCase, False) | 539.9 ns | 9.25 ns | 8.65 ns | 539.2 ns | 527.4 ns | 555.1 ns | 1.00 | 0.02 | - | NA |
LastIndexOf_Word_NotFound | Job-VPXAWQ | .NET 9 (main) | (en-US, OrdinalIgnoreCase, False) | 481.5 ns | 11.01 ns | 12.68 ns | 477.1 ns | 467.9 ns | 527.8 ns | 0.89 | 0.03 | - | NA |
So part of the issue here is lost profile data from Tier-1 instr (in particular we lose dynamic pgo for inlinees). The "fix" for this is to instrument the inlinees as well.
Seems like the key missing info is from System.Globalization.OrdinalCasing:ToUpper(ushort):ushort
and when we synthesize we add weight to paths that aren't reached in the benchmark:
I don't think there is an easy fix here. Because of our strategy for instrumenting R2R'd code we will miss some important details about behavior of inlinees.
Run Information
Regressions in System.Globalization.Tests.StringSearch
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md