[Perf] Linux/arm64: 4 Regressions on 4/8/2024 7:16:22 PM

performanceautofiler[bot] commented 7 months ago

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	230dc86e9d92fbf191bf3b45b3f1b656f83d4426
Compare	404b286b23093cd93a985791934756f64a33483e
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Tests.Perf_Encoding

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
[GetString - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Text.Tests.Perf_Encoding.GetString(size%3a%20512%2c%20encName%3a%20%22ascii%22).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	109.71 ns	128.17 ns	1.17	0.03	False
[GetChars - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Text.Tests.Perf_Encoding.GetChars(size%3a%20512%2c%20encName%3a%20%22ascii%22).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	112.29 ns	121.25 ns	1.08	0.04	False
[GetString - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Text.Tests.Perf_Encoding.GetString(size%3a%20512%2c%20encName%3a%20%22utf-8%22).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	156.63 ns	170.52 ns	1.09	0.01	False

graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Tests.Perf_Encoding*'

### System.Text.Tests.Perf_Encoding.GetString(size: 512, encName: "ascii") #### ETL Files #### Histogram #### JIT Disasms ### System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii") #### ETL Files #### Histogram #### JIT Disasms ### System.Text.Tests.Perf_Encoding.GetString(size: 512, encName: "utf-8") #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	230dc86e9d92fbf191bf3b45b3f1b656f83d4426
Compare	404b286b23093cd93a985791934756f64a33483e
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Perf_Ascii

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[ToUtf16 - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Text.Perf_Ascii.ToUtf16(Size%3a%20128).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	15.27 ns	17.98 ns	1.18	0.32	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Perf_Ascii*'

### System.Text.Perf_Ascii.ToUtf16(Size: 128) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

dotnet-policy-service[bot] commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

LoopedBard3 commented 7 months ago

EgorBo commented 5 months ago

@EgorBot -arm64 -perf -commit bc2bd2bd77ecd7a7979a3ef815e3ff36881a1b94 vs 8e2655b932f5f5e184d289982bdc516576df0981 --disasm

using BenchmarkDotNet.Attributes;
using System.Buffers;
using System.Linq;
using System.Text;

BenchmarkDotNet.Running.BenchmarkRunner.Run<Perf_Ascii>(args: args);

public class Perf_Ascii
{
    [Params(
        128)] // vectorized code path
    public int Size;

    private byte[] _bytes, _sameBytes, _bytesDifferentCase;
    private char[] _characters, _sameCharacters, _charactersDifferentCase;

    [GlobalSetup]
    public void Setup()
    {
        _bytes = new byte[Size];
        _bytesDifferentCase = new byte[Size];

        for (int i = 0; i < Size; i++)
        {
            // let ToLower and ToUpper perform the same amount of work
            _bytes[i] = i % 2 == 0 ? (byte)'a' : (byte)'A';
            _bytesDifferentCase[i] = i % 2 == 0 ? (byte)'A' : (byte)'a';
        }
        _sameBytes = _bytes.ToArray();
        _characters = _bytes.Select(b => (char)b).ToArray();
        _sameCharacters = _characters.ToArray();
        _charactersDifferentCase = _bytesDifferentCase.Select(b => (char)b).ToArray();
    }

    [Benchmark]
    [MemoryRandomization]
    public OperationStatus ToUtf16() => Ascii.ToUtf16(_bytes, _characters, out _);
}

EgorBot commented 5 months ago

Results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-TGKDHM : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-DZBMEA : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
OutlierMode=DontRemove  MemoryRandomization=True

Method	Toolchain	Size	Mean	Error	Ratio	Code Size
ToUtf16	Main	128	15.32 ns	0.337 ns	1.00	220 B
ToUtf16	PR	128	18.28 ns	0.117 ns	1.20	220 B

See BDN_Artifacts.zip for details.

🔥Profiler

Flame graphs: Main vs PR (interactive!) Hot asm: Main vs PR Hot functions: Main vs PR

Notes

_For clean `perf` results, make sure you have just one `[Benchmark]` in your app._

EgorBo commented 3 months ago

Will be fixed by https://github.com/dotnet/runtime/pull/102705

JulieLeeMSFT commented 3 months ago

Pushing out to .NET10 since it is a minor perf regression.

stephentoub commented 3 months ago

Pushing out to .NET10 since it is a minor perf regression.

Some of these are upwards of 20%, and in the linked issues that appear to be deduped against this, they're measured in us rather than ns. Am I reading this incorrectly or is there other context I'm missing?

@JulieLeeMSFT ?

dotnet / runtime