dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

Regressions in SciMark2.kernel #77841

Open performanceautofiler[bot] opened 1 year ago

performanceautofiler[bot] commented 1 year ago

Run Information

Architecture arm64
OS ubuntu 20.04
Baseline 3dbc850af3e8bfd6d529ed90cf00247dc9a24512
Compare 98984ccad55362a66b7fd7a8f198bab13c2496bc
Diff Diff

Regressions in SciMark2.kernel

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[benchmarkLU - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 20.04/SciMark2.kernel.benchmarkLU.html>) 613.19 ms 718.89 ms 1.17 0.00 False

graph Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'SciMark2.kernel*'

Related Issues

Regressions

Improvements

### Payloads [Baseline]() [Compare]() ### Histogram ### Edge Detector Info [Collection Data]() #### SciMark2.kernel.benchmarkLU ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionWindowed: Marked as regression because 718.8945194285715 > 643.9885426399999. IsChangePoint: Marked as a change because one of 10/31/2022 1:09:40 PM, 11/2/2022 9:17:08 PM falls between 10/25/2022 8:54:59 AM and 11/2/2022 9:17:08 PM. IsRegressionStdDev: Marked as regression because -825.1818855985395 (T) = (0 -718997492.7443956) / Math.Sqrt((171242071203.19473 / (35)) + (114231475869.8522 / (10))) is less than -2.016692199226234 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (35) + (10) - 2, .025) and -0.17177985581390257 = (613594344.6860073 - 718997492.7443956) / 613594344.6860073 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
EgorBo commented 1 year ago

Suspect: https://github.com/dotnet/runtime/pull/62689 cc @pedrobsaila

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | arm64 -- | -- OS | ubuntu 20.04 Baseline | [3dbc850af3e8bfd6d529ed90cf00247dc9a24512](https://github.com/dotnet/runtime/commit/3dbc850af3e8bfd6d529ed90cf00247dc9a24512) Compare | [98984ccad55362a66b7fd7a8f198bab13c2496bc](https://github.com/dotnet/runtime/commit/98984ccad55362a66b7fd7a8f198bab13c2496bc) Diff | [Diff](https://github.com/dotnet/runtime/compare/3dbc850af3e8bfd6d529ed90cf00247dc9a24512...98984ccad55362a66b7fd7a8f198bab13c2496bc) ### Regressions in SciMark2.kernel Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [benchmarkLU - Duration of single invocation]() | 613.19 ms | 718.89 ms | 1.17 | 0.00 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'SciMark2.kernel*' ``` ### Related Issues #### Regressions #### Improvements
### Payloads [Baseline]() [Compare]() ### Histogram ### Edge Detector Info [Collection Data]() #### SciMark2.kernel.benchmarkLU ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionWindowed: Marked as regression because 718.8945194285715 > 643.9885426399999. IsChangePoint: Marked as a change because one of 10/31/2022 1:09:40 PM, 11/2/2022 9:17:08 PM falls between 10/25/2022 8:54:59 AM and 11/2/2022 9:17:08 PM. IsRegressionStdDev: Marked as regression because -825.1818855985395 (T) = (0 -718997492.7443956) / Math.Sqrt((171242071203.19473 / (35)) + (114231475869.8522 / (10))) is less than -2.016692199226234 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (35) + (10) - 2, .025) and -0.17177985581390257 = (613594344.6860073 - 718997492.7443956) / 613594344.6860073 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: -
Labels: `tenet-performance`, `tenet-performance-benchmarks`, `area-CodeGen-coreclr`, `untriaged`
Milestone: -
pedrobsaila commented 1 year ago

My PR generated some regressions on SciMark2 :

My change is probably the one causing the regression, let me see if I can fix them. Just one question about these comments on SciMark2 :

/// This software is likely to burn your processor, bitflip your memory chips
/// anihilate your screen and corrupt all your disks, so you it at your
/// own risk.

Do I really risk something if I debug these tests locally ?

EgorBo commented 1 year ago

Do I really risk something if I debug these tests locally ?

We run them on daily basis so hopefully not? 🙂 No idea who put that comment there

AndyAyersMS commented 1 year ago

Do I really risk something if I debug these tests locally ?

We run them on daily basis so hopefully not? 🙂 No idea who put that comment there

I think it's a joke -- best I can tell it was added when the benchmark code was ported from Java to C#, many years ago.

AndyAyersMS commented 1 year ago

Let me know if you need any help digging into this. The issues here might be similar to the problems we have with if conversion. For example, by changing something like

if ((x op const) && (y op const)) { ... }

to

if ((x op y) op const) { .... }

we run the risk that if (x op const) is usually false and y is the result of some long-latency computation or dependence chain then evaluating y eagerly can make things slower (and similarly when the combiner is || and (x op const) is usually true).

One way to avoid the potential downside is to not do this optimization when the predicates are in a loop, or when profile data indicates evaluation of y is unlikely.

pedrobsaila commented 1 year ago

Let me know if you need any help digging into this

I ran the benchmark on x64/x86 Windows/Linux and I could not reproduce the 100 ms regression : I have light regressions/improvements (less than 30 ms) depending on the environment. It would be of help to see assembly diffs if you have an arm64 machine just to be sure there's no bad assembly generated

we run the risk that if (x op const) is usually false and y is the result of some long-latency computation or dependence chain then evaluating y eagerly can make things slower (and similarly when the combiner is || and (x op const) is usually true).

Thanks for the hint. I'll see if I can improve regressions based on this. I see something similar in

https://github.com/dotnet/runtime/blob/c04cf0ca8068c6306c135abb3ce98a33de13579c/src/coreclr/jit/optimizer.cpp#L9693-L9700

pedrobsaila commented 1 year ago

I've been playing for a while with SciMark2.kernel perf tests and I don't think that regression is caused by the fact that we might evaluate an expensive if ((x op y) op const) { .... } where x is false and y is an expensive operation. The optimisation is applied only to array range checks indexer >= 0 so the operation is not expensive and is always true (no IndexOutOfRangeException thrown in code)

Here are assembly diffs in Windows X64 (same diffs for Linux x64) for the main methods of SciMark2.kernel :

Did not find any suspicious diffs in assembly. The diffs are not that significant to generate important improvements/regressions. @AndyAyersMS I think I need a little help to dig this.

pedrobsaila commented 1 year ago

It seems like things are back the way they were :

image
AndyAyersMS commented 1 year ago

Likely from #77728