dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.17k stars 4.72k forks source link

[Perf] Windows/x64: 1 Regressions in System.Collections.Sort<IntStruct> #89106

Closed performanceautofiler[bot] closed 1 year ago

performanceautofiler[bot] commented 1 year ago

Run Information

Name Value
Architecture x64
OS Windows 10.0.19042
Queue OwlWindows
Baseline d59af2cf097acb100ad6a9cba652be34be6f4a2e
Compare d9f6b98de38081b71fa9b5a463f302288546435d
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Sort<IntStruct>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
[Array - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.19042/amd/System.Collections.Sort(IntStruct).Array(Size%3a%20512).html>)
📝 - Benchmark Source
5.03 μs 6.10 μs 1.21 0.39 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Sort&lt;IntStruct&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Collections.Sort<IntStruct>.Array(Size: 512) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 6.098761428571428 > 5.2929065. IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 7/12/2023 4:12:15 PM, 7/17/2023 10:23:26 PM falls between 7/8/2023 10:33:35 PM and 7/17/2023 10:23:26 PM. IsRegressionStdDev: Marked as regression because -4.540498423685632 (T) = (0 -5812.810964285714) / Math.Sqrt((22347.782534576407 / (21)) + (172378.95924323285 / (8))) is less than -2.051830516474954 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (8) - 2, .025) and -0.1330905133483964 = (5130.049979068551 - 5812.810964285714) / 5130.049979068551 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
EgorBo commented 1 year ago

Seems to be https://github.com/dotnet/runtime/pull/88749 cc @AndyAyersMS

EgorBo commented 1 year ago

Same on Intel: https://github.com/dotnet/perf-autofiling-issues/issues/19871

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Name | Value -- | -- Architecture | x64 OS | Windows 10.0.19042 Queue | OwlWindows Baseline | [d59af2cf097acb100ad6a9cba652be34be6f4a2e](https://github.com/dotnet/runtime/commit/d59af2cf097acb100ad6a9cba652be34be6f4a2e) Compare | [d9f6b98de38081b71fa9b5a463f302288546435d](https://github.com/dotnet/runtime/commit/d9f6b98de38081b71fa9b5a463f302288546435d) Diff | [Diff](https://github.com/dotnet/runtime/compare/d59af2cf097acb100ad6a9cba652be34be6f4a2e...d9f6b98de38081b71fa9b5a463f302288546435d) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in System.Collections.Sort<IntStruct> Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- [Array - Duration of single invocation]()
📝 - [Benchmark Source]() | 5.03 μs | 6.10 μs | 1.21 | 0.39 | False | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Sort<IntStruct>*' ```
### Payloads [Baseline]() [Compare]() ### System.Collections.Sort<IntStruct>.Array(Size: 512) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 6.098761428571428 > 5.2929065. IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 7/12/2023 4:12:15 PM, 7/17/2023 10:23:26 PM falls between 7/8/2023 10:33:35 PM and 7/17/2023 10:23:26 PM. IsRegressionStdDev: Marked as regression because -4.540498423685632 (T) = (0 -5812.810964285714) / Math.Sqrt((22347.782534576407 / (21)) + (172378.95924323285 / (8))) is less than -2.051830516474954 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (8) - 2, .025) and -0.1330905133483964 = (5130.049979068551 - 5812.810964285714) / 5130.049979068551 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: AndyAyersMS
Labels: `os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `PGO`, `needs-area-label`
Milestone: -
AndyAyersMS commented 1 year ago

IntroSort is recursive, and we now inline it into itself. Not yet clear why this hurts performance.

PGO data thinks the recursion is pretty frequent:

   call site count 75664 callee entry count 78504 scale 0.9638235

That kind of likelihood would normally suggest fairly deep recursion, like 1/(1-0.96...) $\approx$ 30 levels deep, so one level of inlining should cut that depth in half and actually seems plausible. But evidently it has other bad effects. Let me see if I can dig them out.

AndyAyersMS commented 1 year ago

Seems like we should just block inlining of IntroSort, there isn't any simplification that can happen with inlining, and despite the apparently high recursion likelihood, the call overhead here is not a dominating factor.

AndyAyersMS commented 1 year ago

If we block inlining of IntroSort

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
Array Job-GINFRD Pre-89106 512 4.969 us 0.5558 us 0.6177 us 4.640 us 4.253 us 6.081 us 1.00 0.00 - NA
Array Job-LXATAE Post-89106 512 5.224 us 0.6870 us 0.7911 us 4.966 us 4.235 us 6.730 us 1.07 0.24 - NA
Array Job-LWDPQF Noinline 512 4.667 us 0.6068 us 0.6745 us 4.440 us 3.856 us 6.299 us 0.96 0.21 - NA

Note times are highly variable from run to run (per lab data above you can see the inherent ~20% noise level) but based on a number of runs this seems to be a decent option.