dotnet / perf-autofiling-issues

A landing place for auto-filed performance issues before they receive triage
MIT License
9 stars 4 forks source link

[Perf] Linux/x64: 5 Regressions on 3/22/2023 9:02:35 AM #14535

Open performanceautofiler[bot] opened 1 year ago

performanceautofiler[bot] commented 1 year ago

Run Information

Name Value
Architecture x64
OS ubuntu 18.04
Queue TigerUbuntu
Baseline 696c35cfed2ad87b59294a2ee548ea45e7da06ce
Compare 71b0ecd28591a13898e2afa3f8f4ee077b47bec2
Diff Diff
Configs AOT:true, CompilationMode:wasm, RunKind:micro

Regressions in System.Collections.TryGetValueFalse<Int32, Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[IDictionary - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompilationMode=wasm_RunKind=micro/System.Collections.TryGetValueFalse(Int32%2c%20Int32).IDictionary(Size%3a%20512).html>) 18.89 μs 32.93 μs 1.74 0.58 True
[ConcurrentDictionary - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompilationMode=wasm_RunKind=micro/System.Collections.TryGetValueFalse(Int32%2c%20Int32).ConcurrentDictionary(Size%3a%20512).html>) 7.21 μs 17.42 μs 2.42 0.41 True
[Dictionary - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompilationMode=wasm_RunKind=micro/System.Collections.TryGetValueFalse(Int32%2c%20Int32).Dictionary(Size%3a%20512).html>) 15.83 μs 23.87 μs 1.51 0.58 False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline Compare

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.TryGetValueFalse&lt;Int32, Int32&gt;*'
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Collections.TryGetValueFalse<Int32, Int32>.IDictionary(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 32.9316341416707 > 20.928036024259377. IsChangePoint: Marked as a change because one of 3/22/2023 6:15:16 AM, 3/27/2023 3:21:55 PM falls between 3/18/2023 7:37:20 PM and 3/27/2023 3:21:55 PM. IsRegressionStdDev: Marked as regression because -32.59254015999887 (T) = (0 -34115.604101342935) / Math.Sqrt((3677684.0480631124 / (33)) + (1564505.0023849902 / (28))) is less than -2.000995378087428 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (33) + (28) - 2, .025) and -0.6414599755111985 = (20783.69537503852 - 34115.604101342935) / 20783.69537503852 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked as regression because Edge Detector said so. ```#### System.Collections.TryGetValueFalse<Int32, Int32>.ConcurrentDictionary(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 17.41913949621705 > 7.1600678707614405. IsChangePoint: Marked as a change because one of 2/9/2023 12:40:40 PM, 3/22/2023 6:15:16 AM, 3/27/2023 3:21:55 PM falls between 3/18/2023 7:37:20 PM and 3/27/2023 3:21:55 PM. IsRegressionStdDev: Marked as regression because -90.24744227805942 (T) = (0 -17800.383381337622) / Math.Sqrt((141689.31125606145 / (32)) + (273308.3694273779 / (28))) is less than -2.001717484144427 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (32) + (28) - 2, .025) and -1.524733888603562 = (7050.399830923595 - 17800.383381337622) / 7050.399830923595 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked as regression because Edge Detector said so. ```#### System.Collections.TryGetValueFalse<Int32, Int32>.Dictionary(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 23.872032760372115 > 16.552815652767322. IsChangePoint: Marked as a change because one of 3/22/2023 6:15:16 AM, 3/27/2023 3:21:55 PM falls between 3/18/2023 7:37:20 PM and 3/27/2023 3:21:55 PM. IsRegressionStdDev: Marked as regression because -29.168380833131863 (T) = (0 -23864.08234948271) / Math.Sqrt((1145775.478443365 / (33)) + (1179186.6686997903 / (27))) is less than -2.001717484144427 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (33) + (27) - 2, .025) and -0.5202719542301327 = (15697.245669158914 - 23864.08234948271) / 15697.245669158914 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 18.04
Queue TigerUbuntu
Baseline 696c35cfed2ad87b59294a2ee548ea45e7da06ce
Compare 71b0ecd28591a13898e2afa3f8f4ee077b47bec2
Diff Diff
Configs AOT:true, CompilationMode:wasm, RunKind:micro

Regressions in Benchstone.BenchI.Ackermann

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[Test - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompilationMode=wasm_RunKind=micro/Benchstone.BenchI.Ackermann.Test.html>) 5.00 μs 5.33 μs 1.07 0.08 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline Compare

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.Ackermann*'
### Payloads [Baseline]() [Compare]() ### Histogram #### Benchstone.BenchI.Ackermann.Test ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 5.3336339880952375 > 5.242824575179718. IsChangePoint: Marked as a change because one of 1/18/2023 9:06:44 AM, 2/9/2023 4:27:10 AM, 3/2/2023 11:02:42 PM, 3/22/2023 6:15:16 AM, 3/27/2023 3:21:55 PM falls between 3/18/2023 7:37:20 PM and 3/27/2023 3:21:55 PM. IsRegressionStdDev: Marked as regression because -14.703126760271225 (T) = (0 -5335.636508933462) / Math.Sqrt((10837.501164625995 / (33)) + (2194.8534342977105 / (28))) is less than -2.000995378087428 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (33) + (28) - 2, .025) and -0.05885002530298755 = (5039.086160862755 - 5335.636508933462) / 5039.086160862755 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS ubuntu 18.04
Queue TigerUbuntu
Baseline 696c35cfed2ad87b59294a2ee548ea45e7da06ce
Compare 71b0ecd28591a13898e2afa3f8f4ee077b47bec2
Diff Diff
Configs AOT:true, CompilationMode:wasm, RunKind:micro

Regressions in System.Collections.TryGetValueFalse<String, String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[IDictionary - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompilationMode=wasm_RunKind=micro/System.Collections.TryGetValueFalse(String%2c%20String).IDictionary(Size%3a%20512).html>) 29.58 μs 49.78 μs 1.68 0.41 True

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline Compare

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.TryGetValueFalse&lt;String, String&gt;*'
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Collections.TryGetValueFalse<String, String>.IDictionary(Size: 512) ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 49.77797415384615 > 31.99073353383092. IsChangePoint: Marked as a change because one of 3/22/2023 6:15:16 AM, 3/27/2023 3:21:55 PM falls between 3/18/2023 7:37:20 PM and 3/27/2023 3:21:55 PM. IsRegressionStdDev: Marked as regression because -36.134996306743126 (T) = (0 -46680.28848673679) / Math.Sqrt((3047894.9434036617 / (33)) + (2451100.6477795923 / (28))) is less than -2.000995378087428 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (33) + (28) - 2, .025) and -0.48882503542478417 = (31353.777224345373 - 46680.28848673679) / 31353.777224345373 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked as regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
radekdoulik commented 1 year ago

The range is https://github.com/dotnet/runtime/compare/7500625bd9303a48d0500b6123cec604e49039ae...b69b359b30e0c62099b919024d099a3cb013540d

I don't see what might cause that. Maybe infra change? @lewing @SamMonoRT any idea?

SamMonoRT commented 1 year ago

The commit range seems a little off to me. The easy way to determine that is if you click on "Test Report" link above, it opens up the graph, where you can select a point and see both the runtime repo commit, and also the performance repo commit. While doing the same on System.Collections.TryGetValueFalse<String, String> above, I do see https://github.com/dotnet/performance/commit/7c7f326c72d340b13d10c1eab4bbe5c94c7746b4#diff-c0af079ca7caf36302f2fcf6334640c1bcbccb430b8619f9aefe70a4127073f6 updating the actual benchmark test. That may explain the change. @stephentoub maybe in a better position to explain the change.

stephentoub commented 1 year ago

I do see https://github.com/dotnet/performance/commit/7c7f326c72d340b13d10c1eab4bbe5c94c7746b4#diff-c0af079ca7caf36302f2fcf6334640c1bcbccb430b8619f9aefe70a4127073f6 updating the actual benchmark test. That may explain the change. @stephentoub maybe in a better position to explain the change.

That added new tests (extending the existing tests to also run for FrozenDictionary)... it didn't change the existing ones.

SamMonoRT commented 1 year ago

This maybe a learning opportunity for me. So when you add a new benchmark, is there additional work needed for reporting those? I don't see any FrozenDictionary benchmark results in https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu%2020.04_LLVM%3Dfalse_MonoAOT%3Dtrue_MonoInterpreter%3Dfalse_RunKind%3Dmicro_mono/AllTestindex.html and not certain if the total time of AddValueType is increasing by the new addition?

stephentoub commented 1 year ago

is there additional work needed for reporting those?

There shouldn't be. @cincuranet, is this just lag, or is something broken that the new tests don't show up?

cincuranet commented 1 year ago

@stephentoub @SamMonoRT The individual pages are being generated (i.e. here), but the "list" pages need gentle kick. I'll do it.

radekdoulik commented 1 year ago

The commit range seems a little off to me. The easy way to determine that is if you click on "Test Report" link above, it opens up the graph, where you can select a point and see both the runtime repo commit, and also the performance repo commit. While doing the same on System.Collections.TryGetValueFalse<String, String> above, I do see dotnet/performance@7c7f326#diff-c0af079ca7caf36302f2fcf6334640c1bcbccb430b8619f9aefe70a4127073f6 updating the actual benchmark test. That may explain the change. @stephentoub maybe in a better position to explain the change.

Interesting, my comparison has only links to runtime repo. Never saw a link to performance repo changes.