dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.15k stars 4.71k forks source link

CA1857: hundred or thousands of build warnings from System.Runtime.Intrinsics.X86 on updating from .NET 7.0 to 8.0 #95289

Open twest820 opened 10 months ago

twest820 commented 10 months ago

Description

Mostly I'm hitting

byte mask = (byte)Avx.MoveMask(Avx.Compare<some flavor>(arg1, arg2));
Avx.Blend(arg3, arg4, mask) // CA1857 in .NET 8.0, no warning in .NET 7.0

which is odd as there's 1) there's no a priori way of knowing the mask value, so mask cannot be declared const, 2) mask cannot be marked [ConstExpected], and 3) mask is a value type consumed directly by blendps and similar instructions, so there doesn't appear to be an optimization for the compiler to take in response to the mask being constant.

There's also at least one similar case trivially addressed at MSIL gen time by loop unrolling (depending on data marshalling requirements for serialization, actual use cases may have 50+ Extract()s, meaning manual loop unrolling is unrealistic, IMO ).

for (byte element = 0; element < 4; ++element) // or < 8 if using 256 bit SIMD
{
    Avx.Extract(arg1, element); // CA1857 even though element's values are well known
}

Configuration

Visual Studio 17.8.1.

Regression?

The Blend() disassembly I've been getting looks fine, so I'm inclined to say yes. Perhaps there are corner cases where poor codegen can occur, though.

Data

This class may be a conveniently simple example for reference.

Analysis

One possibility is [ConstExpected] was added to response to const int imm8 in the intrinsics' C signatures. From what I know, this doesn't appear to be a correct mapping between C/C++ notions of const, C# notions of const, the implications of ConstExpectedAttribute, and how (E)VEX SIMD instructions access ymm (or zmm or mask) registers. So possibly the CA1857s here artifacts of the tool used for #80192.

If it is a correct warning, it's unclear to me how SIMD intrinsic code for x64 processors can be modified in .NET 8.0 to clear it. Googling gives me plenty of hits on SIMD [ConstExpected] being added as an 8.0 feature but nothing on code migration paths. Class-level CA1857 suppressions are therefore tempting. I'd prefer a more graceful approach if one's available, though.

ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Mostly I'm hitting ``` byte mask = (byte)Avx.MoveMask(Avx.Compare(arg1, arg2)); Avx.Blend(arg3, arg4, mask) // CA1857 in .NET 8.0, no warning in .NET 7.0 ``` which is odd as there's 1) there's no _a priori_ way of knowing the mask value, so `mask` cannot be declared `const`, 2) mask cannot be marked `[ConstExpected]`, and 3) `mask` is a value type consumed directly by `blendps` and similar instructions, so there doesn't appear to be an optimization for the compiler to take in response to the mask being constant. There's also at least one similar case trivially addressed at MSIL gen time by loop unrolling (depending on data marshalling requirements for serialization, actual use cases may have 50+ `Extract()`s, meaning manual loop unrolling is unrealistic, IMO ). ``` for (byte element = 0; element < 4; ++element) // or < 8 if using 256 bit SIMD { Avx.Extract(arg1, element); // CA1857 even though element's values are well known } ``` ### Configuration Visual Studio 17.8.1. ### Regression? The `Blend()` disassembly I've been getting looks fine, so I'm inclined to say yes. Perhaps there are corner cases where poor codegen can occur, though. ### Data [This class](https://github.com/OSU-MARS/BayesianPG/blob/develop/BayesianPG/ThreePG/ThreePGSimd128.cs) may be a conveniently simple example for reference. ### Analysis One possibility is `[ConstExpected]` was added to response to `const int imm8` in the intrinsics' C signatures. From what I know, this doesn't appear to be a correct mapping between C/C++ notions of const, C# notions of const, the implications of `ConstExpectedAttribute`, and how (E)VEX SIMD instructions access ymm (or zmm or mask) registers. So possibly the CA1857s here artifacts of the tool used for #80192. If it is a correct warning, it's unclear to me how SIMD intrinsic code for x64 processors can be modified in .NET 8.0 to clear it. Googling gives me plenty of hits on SIMD `[ConstExpected]` being added as an 8.0 feature but nothing on code migration paths. Class-level CA1857 suppressions are therefore tempting. I'd prefer a more graceful approach if one's available, though.
Author: twest820
Assignees: -
Labels: `area-System.Numerics`
Milestone: -
tannergooding commented 10 months ago

1) there's no a priori way of knowing the mask value, so mask cannot be declared const, 2) mask cannot be marked [ConstExpected]

The warning is because you're using Avx.Blend which emits VBLENDPS (or VBLENDPD). The encoding of that is VBLENDPS xmm1, xmm2/m128, imm8 and so the actual hardware requires you to use a constant. The runtime is currently working around this limitation by emitting a call to a fallback function that contains a 256 case switch table encoding each of the possible encodable constants.

3) mask is a value type consumed directly by blendps and similar instructions

It is not. APIs like Blend require a constant and it is not expected for users to pass in the result of movemask or similar. In some C/C++ compilers, this is straight up unsupported and won't even compile.

There is an alternative API, BlendVariable, which doesn't require a constant. It emit VBLENDVPS (or VBLENDVPD) of which the encoding is VBLENDVPS xmm1, xmm2, xmm3/m128 (or BLENDVPS xmm1, xmm2/m128, <XMM0> on pre AVX hardware).

The expected usage of this is:

Vector128<float> mask = Avx.CompareEqual(arg1, arg2));
Avx.BlendVariable(arg3, arg4, mask);

Doing this should give you a measurable performance increase and behave more as expected.

Class-level CA1857 suppressions are therefore tempting. I'd prefer a more graceful approach if one's available, though.

You should not suppress these. They are warning you of what is functionally "incorrect" code and where you should be using some other API or alternative mechanism in its place.

There are some places where the JIT will try to recognize and optimize particular patterns when the user doesn't pass in a constant, but these are not guaranteed and may not always be supported by other runtimes. There are likewise many places the JIT doesn't try to handle today and where the developer doing the "right thing" themselves is the best option.

ghost commented 10 months ago

This issue has been marked needs-author-action and may be missing some important information.

tannergooding commented 10 months ago

There are notable approved, but not yet implemented, analyzers to help direct users away from problematic patterns and towards more correct alternatives: https://github.com/dotnet/runtime/issues/82488 and https://github.com/dotnet/runtime/issues/82486

Cases like users calling Blend when they should have called BlendVariable would be covered by such an analyzer.

Until the analyzer can be implemented and supported, I'm happy to help direct how different [ConstExpected] warnings should be handled. Either by pointing to an equivalent API or an alternative sequence that will be better performing if there is no "variable" alternative.

twest820 commented 10 months ago

D'oh, thanks Tanner! That all sounds great; sorry I missed the extra v.

Do you have a suggestion for the Extract() case? Since Permute() and Shuffle() are both also [ConstExpected] (and thus routes through Avx.MoveScalar() also trigger CA1857) the best I'm seeing at the moment is approaches like

    static class AvxExtensions
    {
        public static float Extract(Vector128<float> value, byte element)
        {
            return element switch
            {
                0 => Avx.Extract(value, 0),
                1 => Avx.Extract(value, 1),
                2 => Avx.Extract(value, 2),
                3 => Avx.Extract(value, 3),
                _ => throw new ArgumentOutOfRangeException(nameof(element))
            };
        }
    }

This clears the CA1857 but, since it feels like trying to second guess JIT despite the disclaimers above, I'm unsure if it's a better bet codegen-wise than

        public static float Extract(Vector128<float> value, byte element)
        {
            return Avx.Extract(value, element); // suppress CA1857 in suppression file
        }
tannergooding commented 10 months ago

For cases like extract you should probably be using value[index] or value.GetElement(index). Those do not prescribe a particular instruction being used and will do whatever is deemed "most efficient" for the underlying CPU.

In general, using the "cross platform" equivalent API (like x + y instead of Sse.Add(x, y)) is preferred where possible.

twest820 commented 10 months ago

From a general "write portable code once" standpoint, I agree. Most of the SIMD code I work with goes like class FooAvx256 or Foo.BarAvx256(), though, meaning typically it's not portable. Or, alternatively, a GetElement() call's within CPU dispatched code paths where's probably simpler overall to stay consistent with the non-portable code layout—it's rare I write an inner loop body implementable with a shared ARM-x64 subset.

Another reason I have calls like Avx.Extract() rather than Vector128<T>.[] or Vector128<T>.GetElement() is it's a way of getting VEX instructions into debug and release builds without the overhead of doing publishing and risk of relying on <PublishReadyToRunCrossgen2ExtraArgs>-instruction-set:avx,avx2,fma</PublishReadyToRunCrossgen2ExtraArgs>, which is a build toolchain feature that took years to deliver and still seems undocumented outside of GitHub issues. It's also nice to elide the checking the CLR has to do in order to provide robust [] or GetElement() implementations—not a big deal but, if deployed as a general pattern, leaning on application specific context and higher level Debug.Assert()s does instead yield helpful reduction in runtimes.

I've had a chance in the past couple days to flip several SIMD-bearing repos to .NET 8.0 and am not seeing interestingly different variations from what's already been discussed. I think this could therefore be closed since it looks like all the extract options are awkward somewhere.

tannergooding commented 10 months ago
-instruction-set:avx,avx2,fma

Notably x86-x64-v3 is probably "better" as it targets the entire set of ISAs required by the spec'd v3 x64 machine.

still seems undocumented outside of GitHub issues

It's a fairly esoteric scenario that largely only impacts startup performance. It's typical that regular R2R performance is fine and any hot functions will be rejitted such that you get stable throughput within the first few seconds regardless.

You have a similar option, IlcInstructionSet, that works for AOT scenarios.

I think this could therefore be closed since it looks like all the extract options are awkward somewhere.

Many hardware instructions can be fairly awkward to use or come with limitations. That's why we expose the xplat APIs to help cover core patterns where the important thing is to do that as fast as possible and not with very specific instructions.

twest820 commented 10 months ago

It's a fairly esoteric scenario that largely only impacts startup performance.

Mmm, when I've looked at telling crossgen2 not to worry about legacy CPUs there've been meaningful runtime effects. Even with most of the runtime in compute kernels explicitly coded to System.Runtime.Intrinsics.X86.Avx. Nothing huge but worth it you're going to kick off a two week run such. Since PublishReadyToRunCrossgen2ExtraArgs seems to be .NET's only sort of documented equivalent of C++'s /arch:AVX2 IMO it's a mainline build case rather than anything esoteric.

Thanks for mentioning x86-x64-v3. I can't get any search hits on it, though, so it's unclear if it's something which can be used externally at this point. Microsoft also seems to be pushing AOT for ASP.NET, which isn't among our use cases, and it's not an option in VS 17.8.2's publish settings. While IlcInstructionSet has recently gotten low-level GitHub documentation it isn't in the official documentation and also lacks any obvious Visual Studio support. Maybe if one of those gets enough attention to become a feature?

What does work great is your System.Runtime.Intrinsics implementation—thank you for all the time saved in coding C#-C++/CLI-C++ stacks—and then doing our own CPU dispatch. There's some overhead there, sure, but it's been lower cost than trying to coax this aspect of .NET builds into happiness.