dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.72k forks source link

[Analyzer] Suggest more optimal SIMD patterns where applicable #82488

Open tannergooding opened 1 year ago

tannergooding commented 1 year ago

Writing SIMD code can be complex and the "optimal" pattern can vary per platform/architecture. This can get more complicated when newer ISAs are also available for use.

As such, we should provide a code fixer that identifies such patterns and suggests simple transformations to alternative methods that are likely to be better performing.

Category: Performance Severity = suggestion

Example

A simple example is given below. The exact scenarios supported/recognized will likely grow over time based on input from SIMD experts:

- return value * 2;
+ return value + value;

-or-

- return Sse.Shuffle(value, value, 0b11_11_10_10);
+ return Sse.UnpackHigh(value, value);

Some of these are patterns that the JIT could recognize and optimize implicitly, but that can be equally complex. Recognizing and suggesting the fixes in C# is often simpler/faster.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.

Issue Details
Writing SIMD code can be complex and the "optimal" pattern can vary per platform/architecture. This can get more complicated when newer ISAs are also available for use. As such, we should provide a code fixer that identifies such patterns and suggests simple transformations to alternative methods that are likely to be better performing. Category: [Performance](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/performance-warnings) [Severity = suggestion](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/configuration-options#severity-level) ### Example A simple example is given below. The exact scenarios supported/recognized will likely grow over time based on input from SIMD experts: ```diff - return value * 2; + return value + value; ``` -or- ```diff - return Sse.Shuffle(left, right, 0b11_11_10_10); + return Sse.UnpackHigh(left, right); ``` Some of these are patterns that the JIT could recognize and optimize implicitly, but that can be equally complex. Recognizing and suggesting the fixes in C# is often simpler/faster.
Author: tannergooding
Assignees: -
Labels: `area-System.Runtime.Intrinsics`, `code-analyzer`, `code-fixer`, `api-ready-for-review`
Milestone: -
terrajobst commented 1 year ago

Video

Category: Performance

For code that replaces calls into intrinsics this seems fine, such as:

- Sse.Shuffle(value, value, 0b11_11_10_10);
+ Sse.UnpackHigh(value, value);

For more general code that doesn't involve calls to intrinsics we should consider recognizing those in the JIT instead, such as:

- value * 2;
+ value + value;
JimBobSquarePants commented 8 months ago

Analyzers like this really cannot come soon enough. Even a cheat sheet would be useful just now.

huoyaoyuan commented 5 months ago

The recommended patterns is likely to grow in the future. Should these patterns use a same diagnostic ID, or separated ones? Does them require a category or range then? Is there any available cheat sheet, or should us brainstorm for it?

tannergooding commented 5 months ago

Should these patterns use a same diagnostic ID, or separated ones?

I think it really depends on the operation in question and how likely it is for users to want to separate a given pattern out from the rest.

Is there any available cheat sheet, or should us brainstorm for it?

I do think https://github.com/dotnet/runtime/issues/82486 is the more prominent/important one to handle first, while its targeted around maintainability/portability, it also has a big impact on JIT codegen and factors into perf as well.

The analyzer represented by this issue (82488) is then more about recommending alternative patterns that the underlying runtime may not support yet or which may be available on newer ISAs that the user hasn't considered yet that should be smaller or more performant. In some cases, the two analyzers may overlap; for example Sse.Shuffle(value, value, 0b11_11_10_10) could become Sse.UnpackHigh(value, value) or could become Vector128.Shuffle(value, Vector128.Create(2, 2, 3, 3)). The underlying runtime may support many of these cases itself, especially when using the xplat APIs, but given there is RyuJIT, NAOT, MonoJIT, MonoInterpreter, MonoLLVM, and potential future platforms like Unity; it will be impossible to definitively know where the SIMD support exists and what optimizations are available. Given that SIMD code is often "perf critical" giving users recommendations to be more explicit themselves can often be beneficial.

It will ultimately come down to some level of brainstorming, but there's plenty of patterns to be handled many of which are called out in the respective architecture optimization manuals or which other compilers like MSVC/Clang or even RyuJIT itself is already handling. Many of them may even be simplifications that are recommended so that we can keep the amount of runtime pattern recognition simpler. If the analyzer exists to tell users to call Vector128.RotateLeft over doing (x << y) | (x >>> (32 - y)) then the JIT doesn't need to spend cycles handling that pattern itself and can spend its time handling more meaningful cases instead.

tannergooding commented 4 months ago

https://github.com/dotnet/runtime/issues/103557 is an example of a place where this analyzer applies. Simply in removing redundant AsT() casts.