dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.22k stars 4.72k forks source link

JIT SVE: Assertion failed 'targetReg != embMaskOp2Reg' during 'Generate code' #106864

Closed jakobbotsch closed 1 month ago

jakobbotsch commented 2 months ago
// Generated by Fuzzlyn v2.3 on 2024-08-23 09:12:06
// Run on Arm64 Windows
// Seed: 9639718980642677114-vectort,vector64,vector128,armsve
// Reduced from 52.6 KiB to 0.4 KiB in 00:00:26
// Hits JIT assert in Release:
// Assertion failed 'targetReg != embMaskOp2Reg' in 'Program:Main(Fuzzlyn.ExecutionServer.IRuntime)' during 'Generate code' (IL size 32; hash 0xade6b36b; FullOpts)
//
//     File: C:\dev\dotnet\runtime2\src\coreclr\jit\hwintrinsiccodegenarm64.cpp Line: 818
//
using System;
using System.Numerics;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm;

public class C1
{
    public Vector<short> F1;
}

public class Program
{
    public static C1 s_2;
    public static void Main()
    {
        C1 vr2 = s_2;
        var vr3 = vr2.F1;
        var vr4 = vr2.F1;
        vr2.F1 = Sve.Max(vr3, vr4);
    }
}

cc @dotnet/arm64-contrib @dotnet/jit-contrib

dotnet-policy-service[bot] commented 2 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

a74nh commented 2 months ago

What's happening here is that for Sve.Max(vr3, vr4) both vr3 and vr4 are the same.

                                                            /--*  t26    simd16
N005 (  4,  3) [000052] DA-XG------                         *  STORE_LCL_VAR simd16 V07 cse0         d:1 $VN.Void
N006 (  1,  1) [000053] -----------                   t53 =    LCL_VAR   simd16 V07 cse0         u:1 <l:$2c1, c:$2c2>
N008 (  1,  1) [000055] -----------                   t55 =    LCL_VAR   simd16 V07 cse0         u:1 (last use) <l:$2c0, c:$300>
               [000073] -----------                   t73 =    HWINTRINSIC mask   short CreateTrueMaskAll
               [000074] -c---------                   t74 =    CNS_VEC   simd16<0x00000000, 0x00000000, 0x00000000, 0x00000000>
                                                            /--*  t53    simd16
                                                            +--*  t55    simd16
N009 (  7,  6) [000034] -A-XG+-----                   t34 = *  HWINTRINSIC simd16 short Max <l:$2c4, c:$2c5>
                                                            /--*  t73    mask
                                                            +--*  t34    simd16
                                                            +--*  t74    simd16
               [000075] -A-XG------                   t75 = *  HWINTRINSIC simd16 short ConditionalSelect
N010 (  1,  1) [000023] -----+-----                   t23 =    LCL_VAR   ref    V03 tmp2         u:2 (last use) <l:$181, c:$141>

At codegen we combine the cndsel and max to try to create:

IN000a: 000034      ldr     q16, [x0, #0x08]
IN000b: 000038      ptrue   p0.h
IN000c: 00003C      movprfx z17.h, p0/z, z17.h
IN000d: 000040      smax    z17.h, p0/m, z17.h, z17.h

This is generated by simply removing the assert.

However, there is still an issue .... the instruction after a movprfx cannot use the same destination register as an input. (see here - The prefixed instruction must not use the destination register in any other operand position)

What we need to do is delay free all the inputs to SMAX, such that it now creates:

IN000a: 000034      ldr     q16, [x0, #0x08]
IN000b: 000038      ptrue   p0.h
IN000c: 00003C      movprfx z17.h, p0/z, z16.h
IN000d: 000040      smax    z17.h, p0/m, z17.h, z16.h

Which is good.

Problem I'm having right now is how, during lsra, do I detect that these two nodes refer to the same value?

N006 (  1,  1) [000053] -----------                   t53 =    LCL_VAR   simd16 V07 cse0         u:1 <l:$2c1, c:$2c2>
N008 (  1,  1) [000055] -----------                   t55 =    LCL_VAR   simd16 V07 cse0         u:1 (last use) <l:$2c0, c:$300>

Can I check the cse somehow?? Looking at the node, gtCSEnum is 0.

jakobbotsch commented 2 months ago

Problem I'm having right now is how, during lsra, do I detect that these two nodes refer to the same value?

N006 (  1,  1) [000053] -----------                   t53 =    LCL_VAR   simd16 V07 cse0         u:1 <l:$2c1, c:$2c2>
N008 (  1,  1) [000055] -----------                   t55 =    LCL_VAR   simd16 V07 cse0         u:1 (last use) <l:$2c0, c:$300>

Can I check the cse somehow?? Looking at the node, gtCSEnum is 0.

Why do we need to detect that? Can't we just delay free the inputs unconditionally?

a74nh commented 2 months ago

Why do we need to detect that? Can't we just delay free the inputs unconditionally?

That would work. It's not going to hit performance. It will increase the register usage a little, but we can fix that at a later stage if we find we're spilling too much.

jakobbotsch commented 2 months ago

Hmm, not sure that I understand -- why does it increase register pressure? Since those registers cannot be used as the destination, is there any choice but to unconditionally mark them as delay freed?

I probably don't fully understand how the conditional select comes into play here.

a74nh commented 2 months ago

Hmm, not sure that I understand -- why does it increase register pressure? Since those registers cannot be used as the destination, is there any choice but to unconditionally mark them as delay freed?

I was thinking it would delayfreeing when the inputs are different would do that. But, no, they will have different registers anyway.