dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.75k forks source link

[Perf] Linux/arm64: 4 Regressions on 4/8/2024 7:16:22 PM #100922

Open performanceautofiler[bot] opened 7 months ago

performanceautofiler[bot] commented 7 months ago

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 230dc86e9d92fbf191bf3b45b3f1b656f83d4426
Compare 404b286b23093cd93a985791934756f64a33483e
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Tests.Perf_Encoding

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
109.71 ns 128.17 ns 1.17 0.03 False
112.29 ns 121.25 ns 1.08 0.04 False
156.63 ns 170.52 ns 1.09 0.01 False

graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Tests.Perf_Encoding*'
### System.Text.Tests.Perf_Encoding.GetString(size: 512, encName: "ascii") #### ETL Files #### Histogram #### JIT Disasms ### System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii") #### ETL Files #### Histogram #### JIT Disasms ### System.Text.Tests.Perf_Encoding.GetString(size: 512, encName: "utf-8") #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 230dc86e9d92fbf191bf3b45b3f1b656f83d4426
Compare 404b286b23093cd93a985791934756f64a33483e
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Perf_Ascii

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
15.27 ns 17.98 ns 1.18 0.32 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Perf_Ascii*'
### System.Text.Perf_Ascii.ToUtf16(Size: 128) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
dotnet-policy-service[bot] commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

LoopedBard3 commented 7 months ago

Related regressions: Windows x64: https://github.com/dotnet/perf-autofiling-issues/issues/32733 Linux x64: https://github.com/dotnet/perf-autofiling-issues/issues/32721

EgorBo commented 5 months ago

@EgorBot -arm64 -perf -commit bc2bd2bd77ecd7a7979a3ef815e3ff36881a1b94 vs 8e2655b932f5f5e184d289982bdc516576df0981 --disasm

using BenchmarkDotNet.Attributes;
using System.Buffers;
using System.Linq;
using System.Text;

BenchmarkDotNet.Running.BenchmarkRunner.Run<Perf_Ascii>(args: args);

public class Perf_Ascii
{
    [Params(
        128)] // vectorized code path
    public int Size;

    private byte[] _bytes, _sameBytes, _bytesDifferentCase;
    private char[] _characters, _sameCharacters, _charactersDifferentCase;

    [GlobalSetup]
    public void Setup()
    {
        _bytes = new byte[Size];
        _bytesDifferentCase = new byte[Size];

        for (int i = 0; i < Size; i++)
        {
            // let ToLower and ToUpper perform the same amount of work
            _bytes[i] = i % 2 == 0 ? (byte)'a' : (byte)'A';
            _bytesDifferentCase[i] = i % 2 == 0 ? (byte)'A' : (byte)'a';
        }
        _sameBytes = _bytes.ToArray();
        _characters = _bytes.Select(b => (char)b).ToArray();
        _sameCharacters = _characters.ToArray();
        _charactersDifferentCase = _bytesDifferentCase.Select(b => (char)b).ToArray();
    }

    [Benchmark]
    [MemoryRandomization]
    public OperationStatus ToUtf16() => Ascii.ToUtf16(_bytes, _characters, out _);
}
EgorBot commented 5 months ago

Results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-TGKDHM : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-DZBMEA : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
OutlierMode=DontRemove  MemoryRandomization=True
Method Toolchain Size Mean Error Ratio Code Size
ToUtf16 Main 128 15.32 ns 0.337 ns 1.00 220 B
ToUtf16 PR 128 18.28 ns 0.117 ns 1.20 220 B

See BDN_Artifacts.zip for details.

๐Ÿ”ฅProfiler

Flame graphs: Main vs PR (interactive!) Hot asm: Main vs PR Hot functions: Main vs PR

Notes _For clean `perf` results, make sure you have just one `[Benchmark]` in your app._
EgorBo commented 3 months ago

Will be fixed by https://github.com/dotnet/runtime/pull/102705

JulieLeeMSFT commented 3 months ago

Pushing out to .NET10 since it is a minor perf regression.

stephentoub commented 3 months ago

Pushing out to .NET10 since it is a minor perf regression.

Some of these are upwards of 20%, and in the linked issues that appear to be deduped against this, they're measured in us rather than ns. Am I reading this incorrectly or is there other context I'm missing?

@JulieLeeMSFT ?