[Perf] Windows/x64: 1 Regressions in System.Collections.Sort<IntStruct>

performanceautofiler[bot] commented 1 year ago

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.19042
Queue	OwlWindows
Baseline	d59af2cf097acb100ad6a9cba652be34be6f4a2e
Compare	d9f6b98de38081b71fa9b5a463f302288546435d
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Sort<IntStruct>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[Array - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.19042/amd/System.Collections.Sort(IntStruct).Array(Size%3a%20512).html>) 📝 - Benchmark Source	5.03 μs	6.10 μs	1.21	0.39	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Sort&lt;IntStruct&gt;*'

### Payloads [Baseline]() [Compare]() ### System.Collections.Sort<IntStruct>.Array(Size: 512) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 6.098761428571428 > 5.2929065. IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 7/12/2023 4:12:15 PM, 7/17/2023 10:23:26 PM falls between 7/8/2023 10:33:35 PM and 7/17/2023 10:23:26 PM. IsRegressionStdDev: Marked as regression because -4.540498423685632 (T) = (0 -5812.810964285714) / Math.Sqrt((22347.782534576407 / (21)) + (172378.95924323285 / (8))) is less than -2.051830516474954 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (8) - 2, .025) and -0.1330905133483964 = (5130.049979068551 - 5812.810964285714) / 5130.049979068551 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

EgorBo commented 1 year ago

Seems to be https://github.com/dotnet/runtime/pull/88749 cc @AndyAyersMS

EgorBo commented 1 year ago

Same on Intel: https://github.com/dotnet/perf-autofiling-issues/issues/19871

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

### Payloads [Baseline]() [Compare]() ### System.Collections.Sort<IntStruct>.Array(Size: 512) #### ETL Files #### Histogram #### Description of detection logic ``` IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 6.098761428571428 > 5.2929065. IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 7/12/2023 4:12:15 PM, 7/17/2023 10:23:26 PM falls between 7/8/2023 10:33:35 PM and 7/17/2023 10:23:26 PM. IsRegressionStdDev: Marked as regression because -4.540498423685632 (T) = (0 -5812.810964285714) / Math.Sqrt((22347.782534576407 / (21)) + (172378.95924323285 / (8))) is less than -2.051830516474954 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (21) + (8) - 2, .025) and -0.1330905133483964 = (5130.049979068551 - 5812.810964285714) / 5130.049979068551 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	AndyAyersMS
Labels:	`os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `PGO`, `needs-area-label`
Milestone:	-

AndyAyersMS commented 1 year ago

IntroSort is recursive, and we now inline it into itself. Not yet clear why this hurts performance.

PGO data thinks the recursion is pretty frequent:

   call site count 75664 callee entry count 78504 scale 0.9638235

That kind of likelihood would normally suggest fairly deep recursion, like 1/(1-0.96...) $\approx$ 30 levels deep, so one level of inlining should cut that depth in half and actually seems plausible. But evidently it has other bad effects. Let me see if I can dig them out.

AndyAyersMS commented 1 year ago

Seems like we should just block inlining of IntroSort, there isn't any simplification that can happen with inlining, and despite the apparently high recursion likelihood, the call overhead here is not a dominating factor.

AndyAyersMS commented 1 year ago

If we block inlining of IntroSort

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
Array	Job-GINFRD	Pre-89106	512	4.969 us	0.5558 us	0.6177 us	4.640 us	4.253 us	6.081 us	1.00	0.00	-	NA
Array	Job-LXATAE	Post-89106	512	5.224 us	0.6870 us	0.7911 us	4.966 us	4.235 us	6.730 us	1.07	0.24	-	NA
Array	Job-LWDPQF	Noinline	512	4.667 us	0.6068 us	0.6745 us	4.440 us	3.856 us	6.299 us	0.96	0.21	-	NA

Note times are highly variable from run to run (per lab data above you can see the inherent ~20% noise level) but based on a number of runs this seems to be a decent option.

dotnet / runtime