performanceautofiler[bot] commented 1 year ago

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.19042
Queue	OwlWindows
Baseline	538cd03b39afb57eb17ea77a86b3c94c039b8822
Compare	c7d36b48a493789c9267069645e2690e804a2067
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[LastIndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.19042/amd/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20OrdinalIgnoreCase%2c%20False)).html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	397.12 ns	516.40 ns	1.30	0.03	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'

### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False)) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 516.3989242627822 > 416.48485541506614. IsChangePoint: Marked as a change because one of 6/22/2023 7:55:14 AM, 8/27/2023 2:42:29 PM, 9/4/2023 9:28:57 AM falls between 8/26/2023 7:53:22 PM and 9/4/2023 9:28:57 AM. IsRegressionStdDev: Marked as regression because -523.7204883102902 (T) = (0 -516.5964530091254) / Math.Sqrt((0.39932907677050516 / (9)) + (0.19792037708236598 / (24))) is less than -2.0395134463927462 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (24) - 2, .025) and -0.30301021705518705 = (396.4638544252073 - 516.5964530091254) / 396.4638544252073 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

cincuranet commented 1 year ago

Windows x64 Intel and AMD both are stable-regressed. Transfering.

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

[LastIndexOf_Word_NotFound - Duration of single invocation]()
📝 - [Benchmark Source]()
[📈 - ADX Test Multi Config Graph]()

| 397.12 ns | 516.40 ns | 1.30 | 0.03 | False | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*' ```

### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False)) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 516.3989242627822 > 416.48485541506614. IsChangePoint: Marked as a change because one of 6/22/2023 7:55:14 AM, 8/27/2023 2:42:29 PM, 9/4/2023 9:28:57 AM falls between 8/26/2023 7:53:22 PM and 9/4/2023 9:28:57 AM. IsRegressionStdDev: Marked as regression because -523.7204883102902 (T) = (0 -516.5964530091254) / Math.Sqrt((0.39932907677050516 / (9)) + (0.19792037708236598 / (24))) is less than -2.0395134463927462 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (24) - 2, .025) and -0.30301021705518705 = (396.4638544252073 - 516.5964530091254) / 396.4638544252073 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `PGO`, `needs-area-label`
Milestone:	-

BruceForstall commented 1 year ago

@AndyAyersMS Can you investigate?

AndyAyersMS commented 1 year ago

Looks like a fairly broad regression (Aug-28)

AndyAyersMS commented 2 months ago

Still seems regressed overall... lots of fluctuations but higher than it was last year

AndyAyersMS commented 2 months ago

Commit range is https://github.com/dotnet/runtime/compare/f5881e59f1e625b2f27d1553dc572c5e9b66dbe4...a7d4b4b4761b68c85e27d9c4c8dacb86c285ddc6

Seems like a static PGO update lead to this one.

AndyAyersMS commented 2 months ago

At least locally this improved and then re-regressed

Method	Runtime	Options	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
LastIndexOf_Word_NotFound	.NET 7.0	(en-US, OrdinalIgnoreCase, False)	624.0 ns	20.22 ns	23.28 ns	616.5 ns	599.5 ns	691.3 ns	1.00	0.05	-	NA
LastIndexOf_Word_NotFound	.NET 8.0	(en-US, OrdinalIgnoreCase, False)	781.5 ns	33.94 ns	39.08 ns	774.1 ns	732.8 ns	894.0 ns	1.25	0.08	-	NA
LastIndexOf_Word_NotFound	.NET 9.0p5	(en-US, OrdinalIgnoreCase, False)	616.5 ns	37.83 ns	43.56 ns	596.0 ns	581.7 ns	720.6 ns	0.99	0.08	-	NA
LastIndexOf_Word_NotFound	.NET 9.0p6	(en-US, OrdinalIgnoreCase, False)	725.0 ns	5.01 ns	4.69 ns	725.0 ns	717.9 ns	735.7 ns	1.23	0.02	-	NA

7.0
Benchmark: found 20 intervals; mean interval 220.592ms

00.02%   1E+04       ?        Unknown
95.48%   4.25E+07    Tier-1   [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
01.03%   4.6E+05     Tier-1   [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)
00.85%   3.8E+05     Tier-1   [System.Private.CoreLib]Ordinal.LastIndexOfOrdinalIgnoreCase(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.65%   2.9E+05     Tier-1   [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.61%   2.7E+05     Tier-1   [5c35a557-7919-4c35-a009-c8e8c4d39c9c]Runnable_0.WorkloadActionUnroll(int64)

9.0 preview 6
Benchmark: found 20 intervals; mean interval 296.440ms

96.75%   5.811E+07   Tier-1   [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.80%   4.8E+05     Tier-1   [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.73%   4.4E+05     native   clrjit.dll
00.70%   4.2E+05     native   coreclr.dll
00.35%   2.1E+05     Tier-1   [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)

AndyAyersMS commented 2 months ago

Need to do more digging but certainly looks like a layout issue. Raw sample attribution

NET 7
Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8ACAFF460 -- 0x00007FF8ACAFF783 (length 0x0323)
0x0000 : 1
0x0008 : 3
0x000C : 1
0x0017 : 2
0x0025 : 3
0x002F : 1
0x0037 : 2
0x003C : 1
0x0043 : 1
0x004C : 417
0x005B : 11
0x005F : 387
0x0062 : 1
0x0068 : 2
0x0079 : 8
0x007D : 432
0x0086 : 4
0x0089 : 1
0x008C : 5
0x008E : 395
0x0098 : 7
0x009B : 12
0x00A5 : 761
0x00A8 : 42
0x00AE : 1
0x0107 : 1
0x010B : 367
0x010E : 47
0x0111 : 3
0x0113 : 2
0x011D : 390
0x0120 : 50
0x012A : 352
0x0133 : 4
0x0188 : 43
0x0191 : 23
0x0195 : 17
0x02BC : 3
0x02C5 : 407
0x02D7 : 7
0x02DC : 15
0x02E1 : 15
0x02E6 : 2
0x02E8 : 1

NET 9

Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8CB941640 -- 0x00007FF8CB941BA1 (length 0x0561)
0x0009 : 9
0x000A : 1
0x000C : 3
0x0016 : 3
0x001E : 1
0x0022 : 1
0x004A : 1
0x005E : 1
0x0061 : 1
0x006C : 1
0x008B : 399
0x0097 : 5
0x009C : 8
0x00A2 : 409
0x00AE : 3
0x00B3 : 1
0x00BB : 2
0x00BE : 368
0x00C1 : 1
0x00C9 : 4
0x01DC : 2
0x01E1 : 14
0x01E5 : 22
0x01E8 : 390
0x01EE : 10
0x0305 : 3
0x030E : 418
0x0329 : 10
0x0331 : 14
0x0333 : 1
0x0335 : 2
0x033B : 5
0x034C : 419
0x0359 : 12
0x035C : 1
0x0367 : 387
0x036F : 36
0x0372 : 6
0x0375 : 446
0x0380 : 2
0x038A : 397
0x038D : 1
0x0391 : 31
0x039E : 385
0x03A1 : 29
0x03A7 : 1529
0x03AC : 7
0x03B0 : 10

Note in particular the heavy sample count at the end of the .NET 9 code -- seems like a hot loop has gotten widely spread out.

This is an R2R'd method so went through Tier-1 instr and some some of its callees either will lack PGO or will have static PGO.

If we disable R2R we see 9.0 do pretty well..

OutlierMode=DontRemove EnvironmentVariables=DOTNET_ReadyToRun=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method	Job	Toolchain	Options	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
LastIndexOf_Word_NotFound	Job-GPYIFC	.NET 7	(en-US, OrdinalIgnoreCase, False)	539.9 ns	9.25 ns	8.65 ns	539.2 ns	527.4 ns	555.1 ns	1.00	0.02	-	NA
LastIndexOf_Word_NotFound	Job-VPXAWQ	.NET 9 (main)	(en-US, OrdinalIgnoreCase, False)	481.5 ns	11.01 ns	12.68 ns	477.1 ns	467.9 ns	527.8 ns	0.89	0.03	-	NA

So part of the issue here is lost profile data from Tier-1 instr (in particular we lose dynamic pgo for inlinees). The "fix" for this is to instrument the inlinees as well.

AndyAyersMS commented 2 months ago

Seems like the key missing info is from System.Globalization.OrdinalCasing:ToUpper(ushort):ushort and when we synthesize we add weight to paths that aren't reached in the benchmark:

AndyAyersMS commented 1 month ago

I don't think there is an easy fix here. Because of our strategy for instrumenting R2R'd code we will miss some important details about behavior of inlinees.

44372 tracks gathering profile data for inlinees when we are profiling and optimizing.

dotnet / runtime

Regressions in System.Globalization.Tests.StringSearch #91938

Run Information

Regressions in System.Globalization.Tests.StringSearch

Repro

44372 tracks gathering profile data for inlinees when we are profiling and optimizing.