dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.14k stars 4.71k forks source link

Regressions in System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer and Hashset.ContainsTrue<Int32> #67101

Closed performanceautofiler[bot] closed 2 years ago

performanceautofiler[bot] commented 2 years ago

`### Run Information

Architecture arm64
OS Windows 10.0.19041
Baseline ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d
Compare 0b12d37843e7165fb4c8b794186f19ef43af6c73
Diff Diff
tighter diff https://github.com/dotnet/runtime/compare/a1bc0f34fc8ad77c31a1682841d92dbb20e39dd8...5371203d5820a21922357e954e8c43eb4b76fd1d

Regressions in System.Text.Json.Document.Tests.Perf_EnumerateArray

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[EnumerateUsingIndexer - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_Windows 10.0.19041/System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase%3a%20ArrayOfStrings).html>) 1.74 μs 1.85 μs 1.06 0.01 False
[EnumerateArray- Duration of single invocation moved to https://github.com/dotnet/runtime/issues/67176](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_Windows 10.0.19041/System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateArray(TestCase%3a%20Json400KB).html>) 2.29 μs 2.96 μs 1.29 0.01 False

graph

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.Json.Document.Tests.Perf_EnumerateArray*'
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: ArrayOfStrings) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 1.847930218805441 > 1.826793768692464. IsChangePoint: Marked as a change because one of 12/2/2021 4:22:41 PM, 1/28/2022 4:34:00 PM, 3/17/2022 6:54:53 PM, 3/22/2022 12:26:23 PM falls between 3/13/2022 11:50:20 AM and 3/22/2022 12:26:23 PM. IsRegressionStdDev: Marked as regression because -37.567265158486116 (T) = (0 -1844.6392731408614) / Math.Sqrt((179.10677767316412 / (27)) + (4.721594038302771 / (5))) is less than -2.0422724562973107 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (27) + (5) - 2, .025) and -0.05939220196144539 = (1741.2241375059637 - 1844.6392731408614) / 1741.2241375059637 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ```#### System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateArray(TestCase: Json400KB) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 2.9598147370059142 > 2.4018696254624405. IsChangePoint: Marked as a change because one of 1/28/2022 4:34:00 PM, 3/15/2022 12:53:39 AM, 3/22/2022 12:26:23 PM falls between 3/13/2022 11:50:20 AM and 3/22/2022 12:26:23 PM. IsRegressionStdDev: Marked as regression because -120.6428218804647 (T) = (0 -2935.2304895714733) / Math.Sqrt((19.852319641720772 / (16)) + (437.8173014793286 / (16))) is less than -2.0422724562973107 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (16) - 2, .025) and -0.2817622489671951 = (2289.996051870456 - 2935.2304895714733) / 2289.996051870456 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Architecture arm64
OS Windows 10.0.19041
Baseline ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d
Compare 0b12d37843e7165fb4c8b794186f19ef43af6c73
Diff Diff

Regressions in System.Collections.ContainsTrue<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[HashSet - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_Windows 10.0.19041/System.Collections.ContainsTrue(Int32).HashSet(Size%3a%20512).html>) 4.99 μs 6.01 μs 1.20 0.05 False

graph Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsTrue&lt;Int32&gt;*'
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Collections.ContainsTrue<Int32>.HashSet(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 6.013932003309152 > 5.256438464261174. IsChangePoint: Marked as a change because one of 12/7/2021 3:27:58 PM, 12/8/2021 11:18:41 PM, 12/27/2021 6:47:12 AM, 3/17/2022 6:54:53 PM, 3/22/2022 12:26:23 PM falls between 3/13/2022 11:50:20 AM and 3/22/2022 12:26:23 PM. IsRegressionStdDev: Marked as regression because -19.80322266903542 (T) = (0 -6053.083872496654) / Math.Sqrt((1148.5014152204074 / (27)) + (13486.517429206551 / (5))) is less than -2.0422724562973107 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (27) + (5) - 2, .025) and -0.2066314747169855 = (5016.51415475997 - 6053.083872496654) / 5016.51415475997 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
DrewScoggins commented 2 years ago

LIkely https://github.com/dotnet/runtime/pull/65738

DrewScoggins commented 2 years ago

Related https://github.com/dotnet/perf-autofiling-issues/issues/4255

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-text-json See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | arm64 -- | -- OS | Windows 10.0.19041 Baseline | [ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d](https://github.com/dotnet/runtime/commit/ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d) Compare | [0b12d37843e7165fb4c8b794186f19ef43af6c73](https://github.com/dotnet/runtime/commit/0b12d37843e7165fb4c8b794186f19ef43af6c73) Diff | [Diff](https://github.com/dotnet/runtime/compare/ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d...0b12d37843e7165fb4c8b794186f19ef43af6c73) ### Regressions in System.Text.Json.Document.Tests.Perf_EnumerateArray Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [EnumerateUsingIndexer - Duration of single invocation]() | 1.74 μs | 1.85 μs | 1.06 | 0.01 | False | | | [EnumerateArray - Duration of single invocation]() | 2.29 μs | 2.96 μs | 1.29 | 0.01 | False | | | ![graph]() ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.Json.Document.Tests.Perf_EnumerateArray*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: ArrayOfStrings) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 1.847930218805441 > 1.826793768692464. IsChangePoint: Marked as a change because one of 12/2/2021 4:22:41 PM, 1/28/2022 4:34:00 PM, 3/17/2022 6:54:53 PM, 3/22/2022 12:26:23 PM falls between 3/13/2022 11:50:20 AM and 3/22/2022 12:26:23 PM. IsRegressionStdDev: Marked as regression because -37.567265158486116 (T) = (0 -1844.6392731408614) / Math.Sqrt((179.10677767316412 / (27)) + (4.721594038302771 / (5))) is less than -2.0422724562973107 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (27) + (5) - 2, .025) and -0.05939220196144539 = (1741.2241375059637 - 1844.6392731408614) / 1741.2241375059637 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ```#### System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateArray(TestCase: Json400KB) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 2.9598147370059142 > 2.4018696254624405. IsChangePoint: Marked as a change because one of 1/28/2022 4:34:00 PM, 3/15/2022 12:53:39 AM, 3/22/2022 12:26:23 PM falls between 3/13/2022 11:50:20 AM and 3/22/2022 12:26:23 PM. IsRegressionStdDev: Marked as regression because -120.6428218804647 (T) = (0 -2935.2304895714733) / Math.Sqrt((19.852319641720772 / (16)) + (437.8173014793286 / (16))) is less than -2.0422724562973107 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (16) - 2, .025) and -0.2817622489671951 = (2289.996051870456 - 2935.2304895714733) / 2289.996051870456 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
### Run Information Architecture | arm64 -- | -- OS | Windows 10.0.19041 Baseline | [ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d](https://github.com/dotnet/runtime/commit/ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d) Compare | [0b12d37843e7165fb4c8b794186f19ef43af6c73](https://github.com/dotnet/runtime/commit/0b12d37843e7165fb4c8b794186f19ef43af6c73) Diff | [Diff](https://github.com/dotnet/runtime/compare/ac84ea6e241ad2d2cde346144b2c4b3a5d64fa1d...0b12d37843e7165fb4c8b794186f19ef43af6c73) ### Regressions in System.Collections.ContainsTrue<Int32> Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [HashSet - Duration of single invocation]() | 4.99 μs | 6.01 μs | 1.20 | 0.05 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsTrue<Int32>*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Collections.ContainsTrue<Int32>.HashSet(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 6.013932003309152 > 5.256438464261174. IsChangePoint: Marked as a change because one of 12/7/2021 3:27:58 PM, 12/8/2021 11:18:41 PM, 12/27/2021 6:47:12 AM, 3/17/2022 6:54:53 PM, 3/22/2022 12:26:23 PM falls between 3/13/2022 11:50:20 AM and 3/22/2022 12:26:23 PM. IsRegressionStdDev: Marked as regression because -19.80322266903542 (T) = (0 -6053.083872496654) / Math.Sqrt((1148.5014152204074 / (27)) + (13486.517429206551 / (5))) is less than -2.0422724562973107 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (27) + (5) - 2, .025) and -0.2066314747169855 = (5016.51415475997 - 6053.083872496654) / 5016.51415475997 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: -
Labels: `area-System.Text.Json`, `tenet-performance`, `tenet-performance-benchmarks`, `refs/heads/main`, `RunKind=micro`, `Windows 10.0.19041`, `Regression`, `CoreClr`
Milestone: -
danmoseley commented 2 years ago

I see no plausible libraries changes in this interval. Moving to VM per suggestion above, but it could be something else.

janvorli commented 2 years ago

The System.Text.Json.Document.Tests.Perf_EnumerateArray regression isn't from the https://github.com/dotnet/runtime/pull/65738, it has occurred two days before that change went in.

danmoseley commented 2 years ago

Ah yes, Hashset and Enumerate using indexer commit range -- https://github.com/dotnet/runtime/compare/a1bc0f34fc8ad77c31a1682841d92dbb20e39dd8...5371203d5820a21922357e954e8c43eb4b76fd1d (includes #66456, which might be totally innocent still)

enumerate array commit range: https://github.com/dotnet/runtime/compare/6bf873a991bcae3f80f5de155a594cefc8824eea...bc5e386676ec5c806eef78d5fa754f1eadfe28c2. The plausible candidate there is (https://github.com/dotnet/runtime/pull/66618 (cc @AndyAyersMS ?)

@DrewScoggins it seems the 3 regressions in this issue occurred across 2 different intervals so ought to have 2 issues. analysis script glitch maybe?

danmoseley commented 2 years ago

I also notice that the commit range above (eg., the first one) is 17 commits, but from the graph, I think it should span only 8 commits.

DrewScoggins commented 2 years ago

Yeah, in general the auto-filer can do a good job of putting changes together that happened at the same time. Sometimes however, the actual point that the changepoint algorithm picks is several points off from where the actual changepoint is. This is an underlying weakness inherent in the algorithm. Normally, when we do the triage we try and make sure that only issues that match together end up in the same place, but this one slipped through.

The second piece is that because of this underlying weakness in the algorithm I make the diff link a commit earlier and later then the tool discovers to make sure that we increase the odds of the offending commit being in that range.

danmoseley commented 2 years ago

That's interesting. I'm curious what algorithm you use, etc-- is there a repo somewhere? This one on the face of it seems like an "easy case". (I know it's a hard problem and don't pretend to be able to do better) image

OK I'll break the 2nd one out into its own issue.

DrewScoggins commented 2 years ago

It is checked into an internal repo, but I will put the relevant code here. You will also want to take a look at this package, ruptures, as it is the one we use for the changepoint analysis. You will also find linked there a good paper on changepoint analysis as a field of study. I used that one pretty extensively when I was first building this.

import numpy as np
import ruptures as rpt
import sys

data = open(sys.argv[1], ""r"").read()
points = data.split(',')

for i in range(0, len(points)):
    points[i] = float (points[i])

points = np.concatenate([points])

algo = rpt.Pelt(model=""mahalanobis"", jump=1, min_size=3).fit(points)
results = algo.predict(pen=4*np.log(len(points)))
print(results)
EgorBo commented 2 years ago

Perf_Version.TryFormat2 regression on windows-x64

mangod9 commented 2 years ago

is this still an issue? from @janvorli comment this is not related to VM changes.

mangod9 commented 2 years ago

@DrewScoggins is this issue actionable? Based on the charts this looks like its within the range of what things were in 6?

mangod9 commented 2 years ago

I am closing this as not-actionable at the moment.