dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.87k stars 4.63k forks source link

[Perf] Linux/arm64: 1 Regression in System.Tests.Perf_Int128 #104739

Closed performanceautofiler[bot] closed 2 weeks ago

performanceautofiler[bot] commented 1 month ago

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 9882188fbe16adee3538b65a3926465dfd6052c1
Compare 6c553628dfaec02e665a4ade3deabe3a91281024
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Int128

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
133.94 ns 150.39 ns 1.12 0.04 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Int128*'
### System.Tests.Perf_Int128.TryFormat(value: 170141183460469231731687303715884105727) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
LoopedBard3 commented 1 month ago

Likely caused by https://github.com/dotnet/runtime/pull/104506.

LoopedBard3 commented 1 month ago

@matouskozak @tannergooding FYI

EgorBo commented 1 month ago

I think we should just revert that change then since there were no improvemens from it, only regressions

stephentoub commented 1 month ago

Do we know why it regressed?

EgorBo commented 1 month ago

I bet it has something to do with struct promotion, it's nice to see safe code to be faster than unnecessary unsafe

@EgorBot -arm64 -perf -commit 8d231b4e0f3b9459ac51cf6c7d801e56cb780f08 vs 01dbf511d03f17bff6ac3effc4c5aabaa597cf74 --disasm

using BenchmarkDotNet.Attributes;

public class Perf_Int128
{
    private char[] _destination = new char[Int128.MinValue.ToString().Length];

    public static IEnumerable<object> StringValues => Values.Select(value => value.ToString()).ToArray();

    public static IEnumerable<object> Values => [Int128.MaxValue];

    [Benchmark]
    [ArgumentsSource(nameof(Values))]
    public bool TryFormat(Int128 value) => value.TryFormat(new Span<char>(_destination), out _);
}
EgorBot commented 1 month ago
Benchmark results on Arm64 ``` BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish) Unknown processor Job-LWPCIP : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD Job-KOCEMQ : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD ``` | Method | Toolchain | value | Mean | Error | Ratio | Code Size | |---------- |------------------------ |--------------------- |---------:|--------:|------:|----------:| | TryFormat | Main | 17014(...)05727 [39] | 133.6 ns | 0.02 ns | 1.00 | 2,172 B | | TryFormat | PR | 17014(...)05727 [39] | 150.7 ns | 0.03 ns | 1.13 | 2,656 B | [BDN_Artifacts.zip](https://telegafiles.blob.core.windows.net/telega/BDN_Artifacts_d4b98260.zip) Flame graphs: [Main](https://telegafiles.blob.core.windows.net/telega/base_flamegraph_d4b98260.svg) vs [PR](https://telegafiles.blob.core.windows.net/telega/diff_flamegraph_d4b98260.svg) 🔥 Hot asm: [Main](https://gist.github.com/EgorBot/5233c91f452743583fc15d7c1cfdf9c4) vs [PR](https://gist.github.com/EgorBot/ee4491bce2f4c0fd0037e4945e2df039) Hot functions: [Main](https://gist.github.com/EgorBot/f08b6c9983f63c9fbd90086d2d8bfe0f) vs [PR](https://gist.github.com/EgorBot/5ed2415c6c86cf7b505d4d76751f66d8) _For clean `perf` results, make sure you have just one `[Benchmark]` in your app._
EgorBo commented 1 month ago

Looks like UInt64ToDecChars was inlined in PR and ate some inlining budget so WriteTwoDigits wasn't inlined. I think it's better to revert it than trying to fix the "unsafe" variant.

stephentoub commented 1 month ago

ate some inlining budget so WriteTwoDigits wasn't inlined

Regardless of what we do here, can we please re-investigate having AggressiveInlining mean "for the love of all that is good in this world always inline this if it's physically possible" so we stop having these cases where inlining budget means tiny helpers stop being inlined? I get that means some folks (possibly including us in some cases) that do insane things with AggressiveInlining might start to get worse performance, but they'll be getting what they asked for and can fix it.

EgorBo commented 1 month ago

Regardless of what we do here, can we please re-investigate having AggressiveInlining mean "for the love of all that is good in this world always inline this if it's physically possible" so we stop having these cases where inlining budget means tiny helpers stop being inlined?

Unfortunately, it's not a solution, we may inline all AggressiveInlining and then stop inlining various Span.op_Implicit inside them and other APIs which are currently don't have AggressiveInlining, instead, we should've given up on a large method so everything inside of it had more chances to inline, etc.

stephentoub commented 1 month ago

Regardless of what we do here, can we please re-investigate having AggressiveInlining mean "for the love of all that is good in this world always inline this if it's physically possible" so we stop having these cases where inlining budget means tiny helpers stop being inlined?

Unfortunately, it's not a solution, we may inline all AggressiveInlining and then stop inlining various Span.op_Implicit inside them and other APIs which are currently don't have AggressiveInlining, instead, we should've given up on a large method so everything inside of it had more chances to inline, etc.

It's not a solution even if things marked AggressiveInlining don't consume any inlining budget?

EgorBo commented 1 month ago

It's not a solution even if things marked AggressiveInlining don't consume any inlining budget?

Correct, imagine you have method A (root) and you inline stuff into it. You already inlined a lot and consumed 95% of your buget, then, you encounter this big method B with AggressiveInlining on it and inline it without reducing the budget - boom - you have no budget to inline any calls inside of it, even fairly trivial ones since you only have 5% (maybe for just one)

cc @AndyAyersMS

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

jeffhandley commented 2 weeks ago

I think we should just revert that change then since there were no improvemens from it, only regressions

@stephentoub -- Any objections to reverting Use Unsafe.BitCast for Int128UInt128 operators (#104506)?

stephentoub commented 2 weeks ago

I think it's better to revert it than trying to fix the "unsafe" variant.

Let's do both. We can revert for now to undo the regression, but we shouldn't need to second guess ourselves when using Bitcast.