dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.66k forks source link

Regressions in System.Globalization.Tests.StringSearch #91938

Open performanceautofiler[bot] opened 1 year ago

performanceautofiler[bot] commented 1 year ago

Run Information

Name Value
Architecture x64
OS Windows 10.0.19042
Queue OwlWindows
Baseline 538cd03b39afb57eb17ea77a86b3c94c039b8822
Compare c7d36b48a493789c9267069645e2690e804a2067
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
397.12 ns 516.40 ns 1.30 0.03 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'
### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False)) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 516.3989242627822 > 416.48485541506614. IsChangePoint: Marked as a change because one of 6/22/2023 7:55:14 AM, 8/27/2023 2:42:29 PM, 9/4/2023 9:28:57 AM falls between 8/26/2023 7:53:22 PM and 9/4/2023 9:28:57 AM. IsRegressionStdDev: Marked as regression because -523.7204883102902 (T) = (0 -516.5964530091254) / Math.Sqrt((0.39932907677050516 / (9)) + (0.19792037708236598 / (24))) is less than -2.0395134463927462 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (24) - 2, .025) and -0.30301021705518705 = (396.4638544252073 - 516.5964530091254) / 396.4638544252073 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
cincuranet commented 1 year ago

Windows x64 Intel and AMD both are stable-regressed. Transfering.

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Name | Value -- | -- Architecture | x64 OS | Windows 10.0.19042 Queue | OwlWindows Baseline | [538cd03b39afb57eb17ea77a86b3c94c039b8822](https://github.com/dotnet/runtime/commit/538cd03b39afb57eb17ea77a86b3c94c039b8822) Compare | [c7d36b48a493789c9267069645e2690e804a2067](https://github.com/dotnet/runtime/commit/c7d36b48a493789c9267069645e2690e804a2067) Diff | [Diff](https://github.com/dotnet/runtime/compare/538cd03b39afb57eb17ea77a86b3c94c039b8822...c7d36b48a493789c9267069645e2690e804a2067) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in System.Globalization.Tests.StringSearch Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- |
  • [LastIndexOf_Word_NotFound - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 397.12 ns | 516.40 ns | 1.30 | 0.03 | False | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*' ```
### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False)) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 516.3989242627822 > 416.48485541506614. IsChangePoint: Marked as a change because one of 6/22/2023 7:55:14 AM, 8/27/2023 2:42:29 PM, 9/4/2023 9:28:57 AM falls between 8/26/2023 7:53:22 PM and 9/4/2023 9:28:57 AM. IsRegressionStdDev: Marked as regression because -523.7204883102902 (T) = (0 -516.5964530091254) / Math.Sqrt((0.39932907677050516 / (9)) + (0.19792037708236598 / (24))) is less than -2.0395134463927462 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (24) - 2, .025) and -0.30301021705518705 = (396.4638544252073 - 516.5964530091254) / 396.4638544252073 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: -
Labels: `os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `PGO`, `needs-area-label`
Milestone: -
BruceForstall commented 1 year ago

@AndyAyersMS Can you investigate?

AndyAyersMS commented 1 year ago

Looks like a fairly broad regression (Aug-28)

image
AndyAyersMS commented 2 months ago

Still seems regressed overall... lots of fluctuations but higher than it was last year

image

AndyAyersMS commented 2 months ago

Commit range is https://github.com/dotnet/runtime/compare/f5881e59f1e625b2f27d1553dc572c5e9b66dbe4...a7d4b4b4761b68c85e27d9c4c8dacb86c285ddc6

Seems like a static PGO update lead to this one.

AndyAyersMS commented 2 months ago

At least locally this improved and then re-regressed

Method Runtime Options Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
LastIndexOf_Word_NotFound .NET 7.0 (en-US, OrdinalIgnoreCase, False) 624.0 ns 20.22 ns 23.28 ns 616.5 ns 599.5 ns 691.3 ns 1.00 0.05 - NA
LastIndexOf_Word_NotFound .NET 8.0 (en-US, OrdinalIgnoreCase, False) 781.5 ns 33.94 ns 39.08 ns 774.1 ns 732.8 ns 894.0 ns 1.25 0.08 - NA
LastIndexOf_Word_NotFound .NET 9.0p5 (en-US, OrdinalIgnoreCase, False) 616.5 ns 37.83 ns 43.56 ns 596.0 ns 581.7 ns 720.6 ns 0.99 0.08 - NA
LastIndexOf_Word_NotFound .NET 9.0p6 (en-US, OrdinalIgnoreCase, False) 725.0 ns 5.01 ns 4.69 ns 725.0 ns 717.9 ns 735.7 ns 1.23 0.02 - NA
7.0
Benchmark: found 20 intervals; mean interval 220.592ms

00.02%   1E+04       ?        Unknown
95.48%   4.25E+07    Tier-1   [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
01.03%   4.6E+05     Tier-1   [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)
00.85%   3.8E+05     Tier-1   [System.Private.CoreLib]Ordinal.LastIndexOfOrdinalIgnoreCase(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.65%   2.9E+05     Tier-1   [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.61%   2.7E+05     Tier-1   [5c35a557-7919-4c35-a009-c8e8c4d39c9c]Runnable_0.WorkloadActionUnroll(int64)

9.0 preview 6
Benchmark: found 20 intervals; mean interval 296.440ms

96.75%   5.811E+07   Tier-1   [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.80%   4.8E+05     Tier-1   [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.73%   4.4E+05     native   clrjit.dll
00.70%   4.2E+05     native   coreclr.dll
00.35%   2.1E+05     Tier-1   [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)
AndyAyersMS commented 2 months ago

Need to do more digging but certainly looks like a layout issue. Raw sample attribution

NET 7
Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8ACAFF460 -- 0x00007FF8ACAFF783 (length 0x0323)
0x0000 : 1
0x0008 : 3
0x000C : 1
0x0017 : 2
0x0025 : 3
0x002F : 1
0x0037 : 2
0x003C : 1
0x0043 : 1
0x004C : 417
0x005B : 11
0x005F : 387
0x0062 : 1
0x0068 : 2
0x0079 : 8
0x007D : 432
0x0086 : 4
0x0089 : 1
0x008C : 5
0x008E : 395
0x0098 : 7
0x009B : 12
0x00A5 : 761
0x00A8 : 42
0x00AE : 1
0x0107 : 1
0x010B : 367
0x010E : 47
0x0111 : 3
0x0113 : 2
0x011D : 390
0x0120 : 50
0x012A : 352
0x0133 : 4
0x0188 : 43
0x0191 : 23
0x0195 : 17
0x02BC : 3
0x02C5 : 407
0x02D7 : 7
0x02DC : 15
0x02E1 : 15
0x02E6 : 2
0x02E8 : 1

NET 9

Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8CB941640 -- 0x00007FF8CB941BA1 (length 0x0561)
0x0009 : 9
0x000A : 1
0x000C : 3
0x0016 : 3
0x001E : 1
0x0022 : 1
0x004A : 1
0x005E : 1
0x0061 : 1
0x006C : 1
0x008B : 399
0x0097 : 5
0x009C : 8
0x00A2 : 409
0x00AE : 3
0x00B3 : 1
0x00BB : 2
0x00BE : 368
0x00C1 : 1
0x00C9 : 4
0x01DC : 2
0x01E1 : 14
0x01E5 : 22
0x01E8 : 390
0x01EE : 10
0x0305 : 3
0x030E : 418
0x0329 : 10
0x0331 : 14
0x0333 : 1
0x0335 : 2
0x033B : 5
0x034C : 419
0x0359 : 12
0x035C : 1
0x0367 : 387
0x036F : 36
0x0372 : 6
0x0375 : 446
0x0380 : 2
0x038A : 397
0x038D : 1
0x0391 : 31
0x039E : 385
0x03A1 : 29
0x03A7 : 1529
0x03AC : 7
0x03B0 : 10

Note in particular the heavy sample count at the end of the .NET 9 code -- seems like a hot loop has gotten widely spread out.

This is an R2R'd method so went through Tier-1 instr and some some of its callees either will lack PGO or will have static PGO.

If we disable R2R we see 9.0 do pretty well..

OutlierMode=DontRemove EnvironmentVariables=DOTNET_ReadyToRun=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method Job Toolchain Options Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
LastIndexOf_Word_NotFound Job-GPYIFC .NET 7 (en-US, OrdinalIgnoreCase, False) 539.9 ns 9.25 ns 8.65 ns 539.2 ns 527.4 ns 555.1 ns 1.00 0.02 - NA
LastIndexOf_Word_NotFound Job-VPXAWQ .NET 9 (main) (en-US, OrdinalIgnoreCase, False) 481.5 ns 11.01 ns 12.68 ns 477.1 ns 467.9 ns 527.8 ns 0.89 0.03 - NA

So part of the issue here is lost profile data from Tier-1 instr (in particular we lose dynamic pgo for inlinees). The "fix" for this is to instrument the inlinees as well.

AndyAyersMS commented 2 months ago

Seems like the key missing info is from System.Globalization.OrdinalCasing:ToUpper(ushort):ushort and when we synthesize we add weight to paths that aren't reached in the benchmark: image

AndyAyersMS commented 1 month ago

I don't think there is an easy fix here. Because of our strategy for instrumenting R2R'd code we will miss some important details about behavior of inlinees.

44372 tracks gathering profile data for inlinees when we are profiling and optimizing.