dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.11k stars 4.7k forks source link

[Perf] Linux/arm64: 7 Regressions on 2/23/2024 10:12:07 PM #99121

Open performanceautofiler[bot] opened 7 months ago

performanceautofiler[bot] commented 7 months ago

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline d98af689a245bbc983ea71c52e15ff9cdf376ec7
Compare 7b54246a7bd6b4ea09895b22ba30e45059fbedb4
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Char>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
58.63 ns 66.41 ns 1.13 0.01 True
3.42 ns 5.25 ns 1.53 0.57 True
31.62 ns 34.48 ns 1.09 0.16 False

graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Char&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Memory.Span<Char>.SequenceEqual(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Char>.EndsWith(Size: 4) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Char>.EndsWith(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline d98af689a245bbc983ea71c52e15ff9cdf376ec7
Compare 7b54246a7bd6b4ea09895b22ba30e45059fbedb4
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
57.88 ns 64.75 ns 1.12 0.03 True
56.57 ns 64.96 ns 1.15 0.01 True
108.09 ns 127.47 ns 1.18 0.01 True
3.62 ns 5.11 ns 1.41 0.55 True

graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Memory.Span<Int32>.EndsWith(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.StartsWith(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.SequenceEqual(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.EndsWith(Size: 4) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
ghost commented 7 months ago

Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Name | Value -- | -- Architecture | arm64 OS | ubuntu 22.04 Queue | AmpereUbuntu Baseline | [d98af689a245bbc983ea71c52e15ff9cdf376ec7](https://github.com/dotnet/runtime/commit/d98af689a245bbc983ea71c52e15ff9cdf376ec7) Compare | [7b54246a7bd6b4ea09895b22ba30e45059fbedb4](https://github.com/dotnet/runtime/commit/7b54246a7bd6b4ea09895b22ba30e45059fbedb4) Diff | [Diff](https://github.com/dotnet/runtime/compare/d98af689a245bbc983ea71c52e15ff9cdf376ec7...7b54246a7bd6b4ea09895b22ba30e45059fbedb4) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in System.Memory.Span<Char> Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- |
  • [SequenceEqual - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 58.63 ns | 66.41 ns | 1.13 | 0.01 | True | | | |
  • [EndsWith - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 3.42 ns | 5.25 ns | 1.53 | 0.57 | True | | | |
  • [EndsWith - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 31.62 ns | 34.48 ns | 1.09 | 0.16 | False | | | ![graph]() ![graph]() ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span<Char>*' ```
### Payloads [Baseline]() [Compare]() ### System.Memory.Span<Char>.SequenceEqual(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Char>.EndsWith(Size: 4) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Char>.EndsWith(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
--- ### Run Information Name | Value -- | -- Architecture | arm64 OS | ubuntu 22.04 Queue | AmpereUbuntu Baseline | [d98af689a245bbc983ea71c52e15ff9cdf376ec7](https://github.com/dotnet/runtime/commit/d98af689a245bbc983ea71c52e15ff9cdf376ec7) Compare | [7b54246a7bd6b4ea09895b22ba30e45059fbedb4](https://github.com/dotnet/runtime/commit/7b54246a7bd6b4ea09895b22ba30e45059fbedb4) Diff | [Diff](https://github.com/dotnet/runtime/compare/d98af689a245bbc983ea71c52e15ff9cdf376ec7...7b54246a7bd6b4ea09895b22ba30e45059fbedb4) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in System.Memory.Span<Int32> Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- |
  • [EndsWith - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 57.88 ns | 64.75 ns | 1.12 | 0.03 | True | | | |
  • [StartsWith - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 56.57 ns | 64.96 ns | 1.15 | 0.01 | True | | | |
  • [SequenceEqual - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 108.09 ns | 127.47 ns | 1.18 | 0.01 | True | | | |
  • [EndsWith - Duration of single invocation]()
  • 📝 - [Benchmark Source]()
  • [📈 - ADX Test Multi Config Graph]()
| 3.62 ns | 5.11 ns | 1.41 | 0.55 | True | | | ![graph]() ![graph]() ![graph]() ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span<Int32>*' ```
### Payloads [Baseline]() [Compare]() ### System.Memory.Span<Int32>.EndsWith(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.StartsWith(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.SequenceEqual(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.EndsWith(Size: 4) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: -
Labels: `arch-arm64`, `area-System.Memory`, `os-linux`, `untriaged`, `runtime-coreclr`
Milestone: -
EgorBo commented 7 months ago

Looks to be https://github.com/dotnet/runtime/pull/98700

EgorBo commented 2 months ago

@EgorBot -arm64 -commit 973ceee vs previous --disasm

using BenchmarkDotNet.Attributes;

[GenericTypeArguments(typeof(int))]
public class Span<T>
    where T : struct, IComparable<T>, IEquatable<T>
{
    [Params(4,512)]
    public int Size;

    private T[] _array, _same, _emptyWithSingleValue;
    private T[] _fourValues, _fiveValues;
    private T _notDefaultValue;

    [GlobalSetup]
    public void Setup()
    {
        T[] array = new T[Size * 2];
        _array = array.Take(Size).ToArray();
        _same = _array.ToArray();
    }

    [Benchmark]
    public bool SequenceEqual() => new System.Span<T>(_array)
        .SequenceEqual(new ReadOnlySpan<T>(_same));

    [Benchmark]
    public bool StartsWith() => new System.Span<T>(_array)
        .StartsWith(new ReadOnlySpan<T>(_same).Slice(start: 0, length: Size / 2));

    [Benchmark]
    public bool EndsWith() => new System.Span<T>(_array)
        .EndsWith(new ReadOnlySpan<T>(_same).Slice(start: Size / 2));
}
EgorBot commented 2 months ago
Benchmark results on Arm64 ``` BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish) Unknown processor Job-VJRAQH : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD Job-EZBVNO : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD ``` | Method | Toolchain | Size | Mean | Error | Ratio | Code Size | |-------------- |------------------------ |----- |-----------:|----------:|------:|----------:| | **SequenceEqual** | **Main** | **4** | **3.516 ns** | **0.0007 ns** | **1.00** | **-** | | SequenceEqual | PR | 4 | 3.547 ns | 0.0008 ns | 1.01 | 484 B | | | | | | | | | | StartsWith | Main | 4 | 3.536 ns | 0.0003 ns | 1.00 | 512 B | | StartsWith | PR | 4 | 3.436 ns | 0.0006 ns | 0.97 | 512 B | | | | | | | | | | EndsWith | Main | 4 | 5.264 ns | 0.0009 ns | 1.00 | 512 B | | EndsWith | PR | 4 | 5.370 ns | 0.0032 ns | 1.02 | 512 B | | | | | | | | | | **SequenceEqual** | **Main** | **512** | **127.736 ns** | **0.0428 ns** | **1.00** | **432 B** | | SequenceEqual | PR | 512 | 109.769 ns | 0.0121 ns | 0.86 | 444 B | | | | | | | | | | StartsWith | Main | 512 | 66.018 ns | 0.0099 ns | 1.00 | 472 B | | StartsWith | PR | 512 | 56.800 ns | 0.0048 ns | 0.86 | 484 B | | | | | | | | | | EndsWith | Main | 512 | 67.015 ns | 0.0287 ns | 1.00 | 500 B | | EndsWith | PR | 512 | 58.790 ns | 0.0198 ns | 0.88 | 512 B | [BDN_Artifacts.zip](https://telegafiles.blob.core.windows.net/telega/BDN_Artifacts_b0e01449.zip)
EgorBo commented 2 months ago

@EgorBot -arm64 -commit 973ceee vs previous --disasm --envars "DOTNET_JitDisasm:SequenceEqual"

using BenchmarkDotNet.Attributes;

[GenericTypeArguments(typeof(int))]
public class Span<T>
    where T : struct, IComparable<T>, IEquatable<T>
{
    [Params(512)]
    public int Size;

    private T[] _array, _same, _emptyWithSingleValue;
    private T[] _fourValues, _fiveValues;
    private T _notDefaultValue;

    [GlobalSetup]
    public void Setup()
    {
        T[] array = new T[Size * 2];
        _array = array.Take(Size).ToArray();
        _same = _array.ToArray();
    }

    [Benchmark]
    public bool SequenceEqual() => new System.Span<T>(_array)
        .SequenceEqual(new ReadOnlySpan<T>(_same));
}
EgorBot commented 2 months ago
❌ Benchmark failed on Arm64 ``` publishing results failed ```
EgorBo commented 2 months ago

@EgorBot -arm64 -commit 973ceee vs previous --disasm --envvars "DOTNET_JitDisasm:SequenceEqual"

using BenchmarkDotNet.Attributes;

[GenericTypeArguments(typeof(int))]
public class Span<T>
    where T : struct, IComparable<T>, IEquatable<T>
{
    [Params(512)]
    public int Size;

    private T[] _array, _same, _emptyWithSingleValue;
    private T[] _fourValues, _fiveValues;
    private T _notDefaultValue;

    [GlobalSetup]
    public void Setup()
    {
        T[] array = new T[Size * 2];
        _array = array.Take(Size).ToArray();
        _same = _array.ToArray();
    }

    [Benchmark]
    public bool SequenceEqual() => new System.Span<T>(_array)
        .SequenceEqual(new ReadOnlySpan<T>(_same));
}
EgorBot commented 2 months ago
Benchmark results on Arm64 ``` BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish) Unknown processor Job-HMJFUN : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD Job-IWZNQY : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD EnvironmentVariables=DOTNET_JitDisasm=SequenceEqual ``` | Method | Toolchain | Size | Mean | Error | Ratio | Code Size | |-------------- |------------------------ |----- |---------:|--------:|------:|----------:| | SequenceEqual | Main | 512 | 127.5 ns | 0.02 ns | 1.00 | 72 B | | SequenceEqual | PR | 512 | 109.2 ns | 0.00 ns | 0.86 | 444 B | [BDN_Artifacts.zip](https://telegafiles.blob.core.windows.net/telega/BDN_Artifacts_8c83f45f.zip)
EgorBo commented 2 months ago

It turns out to be tail-call + special intrinsic problem, we have an issue for it somewhere, I'll take a look in 10.0