dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

[Perf] Linux/arm64: 15 Regressions on 5/4/2024 5:29:12 AM #102047

Closed performanceautofiler[bot] closed 2 months ago

performanceautofiler[bot] commented 5 months ago

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e965312582a33c0acf2020648b54a152a80c139a
Compare 5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
53.07 ns 150.70 ns 2.84 0.04 True
1.62 μs 3.21 μs 1.99 0.03 True
658.28 ns 2.85 μs 4.33 0.02 True
2.08 μs 2.56 μs 1.23 0.01 True
48.40 μs 60.44 μs 1.25 0.01 True
48.94 μs 61.04 μs 1.25 0.01 True
872.01 ns 3.18 μs 3.65 0.04 True
68.76 ns 140.01 ns 2.04 0.02 True
2.05 μs 2.51 μs 1.22 0.01 True
817.70 ns 3.20 μs 3.91 0.14 True
2.07 μs 2.52 μs 1.22 0.01 True
48.04 μs 60.23 μs 1.25 0.01 True
54.65 ns 151.26 ns 2.77 0.03 True

graph graph graph graph graph graph graph graph graph graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;*'
### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Truncate(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarBase(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_Vectors(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarAddend(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_Vectors(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Pow_ScalarExponent(BufferLength: 3079) #### ETL Files #### Histogram #### JIT Disasms ### System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.FusedMultiplyAdd_ScalarMultiplier(BufferLength: 128) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e965312582a33c0acf2020648b54a152a80c139a
Compare 5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
3.46 ns 5.32 ns 1.54 0.24 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'
### System.Memory.Span<Int32>.SequenceEqual(Size: 4) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e965312582a33c0acf2020648b54a152a80c139a
Compare 5962fd511e3eacf7fe91520392c041e94e5d31cc
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.TryGetValueFalse<Int32, Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
4.13 μs 5.96 μs 1.44 0.08 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.TryGetValueFalse&lt;Int32, Int32&gt;*'
### System.Collections.TryGetValueFalse<Int32, Int32>.Dictionary(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
dotnet-policy-service[bot] commented 4 months ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

DrewScoggins commented 4 months ago

Looks related to https://github.com/dotnet/runtime/pull/101800.

We also saw many improvements, and those are linked in the above PR.

stephentoub commented 2 months ago

Many of the benchmarks fully recovered, but it looks like those related to Pow did not.

tannergooding commented 2 months ago

@stephentoub, pow is because of https://github.com/dotnet/runtime/commit/aab8803eac4111a711b6eb65242ee8a5b14fa411: Disable TensorPrimitives vectorization of Log, Cbrt, Pow, and RootN

The diff range displayed for some of these is very off and is different if you manually go into https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu%2022.04/System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives(Double).Pow_Vectors(BufferLength:%203079).html and set the compare range in the graph (which is https://github.com/dotnet/runtime/compare/f88ab88f4ce02569f84922cda802317e69682d8a...f9207e6c85bb6aebd9e7af7504fd3db28b32d058)

stephentoub commented 2 months ago

Ah, thanks, in that case it's expected, and I believe we can close this.