dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.14k stars 4.71k forks source link

Regressions in System.Collections.ContainsTrue<String> #65852

Closed performanceautofiler[bot] closed 2 years ago

performanceautofiler[bot] commented 2 years ago

Run Information

Architecture arm64
OS ubuntu 18.04
Baseline 4fd5132c1fb8292e2d94ad4a1b2fe62357205402
Compare 94c0a7c13d158eb79d27223f474ec8f8db747f11
Diff Diff

Regressions in System.Collections.ContainsTrue<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[Span - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 18.04/System.Collections.ContainsTrue(String).Span(Size%3a%20512).html>) 759.66 μs 1.11 ms 1.46 0.41 False

graph Test Report

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsTrue&lt;String&gt;*'
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Collections.ContainsTrue<String>.Span(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 1.1103403372222225 > 794.5004166666666. IsChangePoint: Marked as a change because one of 2/22/2022 2:43:53 PM, 2/24/2022 1:08:23 AM falls between 2/15/2022 8:18:05 AM and 2/24/2022 1:08:23 AM. IsRegressionStdDev: Marked as regression because -20.74385787056367 (T) = (0 -1127622.1266319444) / Math.Sqrt((1029992105.6369385 / (35)) + (1345133291.4700491 / (4))) is less than -2.026192463026769 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (35) + (4) - 2, .025) and -0.5427327845413638 = (730925.107660283 - 1127622.1266319444) / 730925.107660283 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

kunalspathak commented 2 years ago

Most likely caused by https://github.com/dotnet/runtime/pull/65709. @SingleAccretion

SingleAccretion commented 2 years ago

That was an almost regression-less change, I was hoping for some improvements :).

In any case, will take a look.

@kunalspathak This was only seen on ARM64, correct?

kunalspathak commented 2 years ago

That's correct. https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Collections.ContainsTrue(String).Span(Size%3a%20512).html and https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Collections.ContainsTrue(String).Span(Size%3a%20512).html are charts from x64 and don't see regression from your change.

EgorBo commented 2 years ago

Alternative candidate is https://github.com/dotnet/runtime/pull/65561 (maybe it affected code layout in the jitted heap somehow - no idea)

SingleAccretion commented 2 years ago

Initial results:

1) Diffing the codegen that the altjit has for the call stack in the benchmark does not yield anything. Notably, the altjit doesn't use vector instructions for the SpanHelpers.SequenceEquals workhorse method. 2) Diffing the R2Red SpanHelpers.SequenceEquals from the provided baseline/diff payloads does not yield anything either. In fact, diffing the R2Red code for CoreLib (or at least the portion of it that R2RDump managed to write out before crashing...) doesn't yield anything, which is quite suspicious.

Further investigation will require an ARM64 device (I do not have one) to benchmark on, so will take some time...

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | arm64 -- | -- OS | ubuntu 18.04 Baseline | [4fd5132c1fb8292e2d94ad4a1b2fe62357205402](https://github.com/dotnet/runtime/commit/4fd5132c1fb8292e2d94ad4a1b2fe62357205402) Compare | [94c0a7c13d158eb79d27223f474ec8f8db747f11](https://github.com/dotnet/runtime/commit/94c0a7c13d158eb79d27223f474ec8f8db747f11) Diff | [Diff](https://github.com/dotnet/runtime/compare/4fd5132c1fb8292e2d94ad4a1b2fe62357205402...94c0a7c13d158eb79d27223f474ec8f8db747f11) ### Regressions in System.Collections.ContainsTrue<String> Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [Span - Duration of single invocation]() | 759.66 μs | 1.11 ms | 1.46 | 0.41 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsTrue<String>*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Collections.ContainsTrue<String>.Span(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 1.1103403372222225 > 794.5004166666666. IsChangePoint: Marked as a change because one of 2/22/2022 2:43:53 PM, 2/24/2022 1:08:23 AM falls between 2/15/2022 8:18:05 AM and 2/24/2022 1:08:23 AM. IsRegressionStdDev: Marked as regression because -20.74385787056367 (T) = (0 -1127622.1266319444) / Math.Sqrt((1029992105.6369385 / (35)) + (1345133291.4700491 / (4))) is less than -2.026192463026769 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (35) + (4) - 2, .025) and -0.5427327845413638 = (730925.107660283 - 1127622.1266319444) / 730925.107660283 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: SingleAccretion
Labels: `area-System.Collections`, `tenet-performance`, `tenet-performance-benchmarks`, `untriaged`, `refs/heads/main`, `ubuntu 18.04`, `Regression`, `RunKind=micro`, `CoreClr`, `arm64`
Milestone: -
ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | arm64 -- | -- OS | ubuntu 18.04 Baseline | [4fd5132c1fb8292e2d94ad4a1b2fe62357205402](https://github.com/dotnet/runtime/commit/4fd5132c1fb8292e2d94ad4a1b2fe62357205402) Compare | [94c0a7c13d158eb79d27223f474ec8f8db747f11](https://github.com/dotnet/runtime/commit/94c0a7c13d158eb79d27223f474ec8f8db747f11) Diff | [Diff](https://github.com/dotnet/runtime/compare/4fd5132c1fb8292e2d94ad4a1b2fe62357205402...94c0a7c13d158eb79d27223f474ec8f8db747f11) ### Regressions in System.Collections.ContainsTrue<String> Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [Span - Duration of single invocation]() | 759.66 μs | 1.11 ms | 1.46 | 0.41 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsTrue<String>*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Collections.ContainsTrue<String>.Span(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 1.1103403372222225 > 794.5004166666666. IsChangePoint: Marked as a change because one of 2/22/2022 2:43:53 PM, 2/24/2022 1:08:23 AM falls between 2/15/2022 8:18:05 AM and 2/24/2022 1:08:23 AM. IsRegressionStdDev: Marked as regression because -20.74385787056367 (T) = (0 -1127622.1266319444) / Math.Sqrt((1029992105.6369385 / (35)) + (1345133291.4700491 / (4))) is less than -2.026192463026769 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (35) + (4) - 2, .025) and -0.5427327845413638 = (730925.107660283 - 1127622.1266319444) / 730925.107660283 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: SingleAccretion
Labels: `tenet-performance`, `tenet-performance-benchmarks`, `area-CodeGen-coreclr`, `untriaged`, `refs/heads/main`, `ubuntu 18.04`, `Regression`, `RunKind=micro`, `CoreClr`, `arm64`
Milestone: -
SingleAccretion commented 2 years ago

So, in the time since since the regression, the benchmark, it seems, has become bimodal:

image

Given that, and the fact that I did not see invariant indirections in the call stacks of the benchmark that could have impacted the results, I think it is unlikely that #65709 has affected the performance here in an actionable (for me) way.

EgorBo commented 2 years ago

Right, I think we can close this one image

@SingleAccretion sorry for wasting your time