dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.28k stars 4.73k forks source link

Fix type mismatches in VN #58312

Open SingleAccretion opened 3 years ago

SingleAccretion commented 3 years ago

Value numbering can see through some forms of aliasing, but does not always account for type mismatches properly. One example of an incorrect behavior can be seen here: https://github.com/dotnet/runtime/pull/58309#issuecomment-907614688, there may be others.

Problematic places need to be investigated and fixed. A short-term solution could be to give up and give a "new, unique" VN upon seeing such casts, long-term we should consider using GT_BITCAST as a VNFunc for them, with support for folding (and delete assertion propagation code that expects these type mismatches), since, for better or worse, we have a strongly typed model in VN.

category:correctness theme:value-numbering skill-level:expert cost:small impact:small

dotnet-issue-labeler[bot] commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 3 years ago

Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.

Issue Details
Value numbering can see through some forms of aliasing, but does not always account for type mismatches properly. One example of an incorrect behavior can be seen here: https://github.com/dotnet/runtime/pull/58309#issuecomment-907614688, there may be others. Problematic places need to be investigated and fixed. A short-term solution could be to give up and give a "new, unique" VN upon seeing such casts, long-term we should consider using `GT_BITCAST` as a `VNFunc` for them, with support for folding (and delete assertion propagation code that [expects](https://github.com/dotnet/runtime/blob/cffaa78235ea93d5e3eeb56956579df503e11250/src/coreclr/jit/assertionprop.cpp#L2542) these type mismatches), since, for better or worse, we have a strongly typed model in VN.
Author: SingleAccretion
Assignees: -
Labels: `area-CodeGen-coreclr`, `untriaged`
Milestone: -
SingleAccretion commented 3 years ago

So, one would think this is a rather simple fix: just update VNApplySelectorsTypeCheck and VNApplySelectorsAssignTypeCoerce to cover for the type mismatches.

However, things get complicated by the fact that the VNF_MapSelect and VNF_MapStore machinery can give struct and ref types to VNs that actually represent proper field values (i. e. not just to "field map" VNs, though those also tend to appear in unexpected places and we end up with Cast(ref -> int)). Need to spend some time to review the code and decide what should be done about this (noting as well that this is small CQ issue too, as the additional casts block the recursive traversal inside VNApplySelectors).

More type mismatches: VNApplySelectors normalized SIMD types, VNApplySelectorsAssign does not. Edit: fixed in #61370.

And more:

***** BB01, STMT00000(before)
N006 (  6,  6) [000005] -A-XG-------              *  ASG       float 
N004 (  4,  4) [000004] *--XG--N----              +--*  IND       float 
N003 (  2,  2) [000018] -------N----              |  \--*  ADD       byref 
N001 (  1,  1) [000000] ------------              |     +--*  LCL_VAR   ref    V01 arg1         u:1
N002 (  1,  1) [000017] ------------              |     \--*  CNS_INT   long   24 field offset Fseq[SimdField, X]
N005 (  1,  1) [000003] ------------              \--*  CNS_DBL   float  1.0000000000000000

N001 [000000]   LCL_VAR   V01 arg1         u:1 => $80 {InitVal($40)}
N002 [000017]   CNS_INT   24 field offset Fseq[SimdField, X] => $100 {LngCns:  24}
N003 [000018]   ADD       => $140 {ADD($80, $100)}
N005 [000003]   CNS_DBL   1.0000000000000000 => $180 {FltCns[1.000000]}
  Known type Vector4
  VNApplySelectors:
    VNForHandle(SimdField) is $1c0, fieldType is simd16, size = 16
    VNForMapSelect($81, $1c0):simd16 returns $200 {$81[$1c0]}
    VNForMapSelect($200, $80):simd16 returns $201 {$200[$80]}
  VNApplySelectorsAssign:
    VNForHandle(X) is $1c1, fieldType is float
    VNForMapStore($201, $1c1, $180):float in BB01 returns $240 {$201[$1c1 := $180]}
    VNForMapStore($200, $80, $240):simd16 in BB01 returns $280 {$200[$80 := $240]}
  VNApplySelectorsAssign:
    VNForHandle(SimdField) is $1c0, fieldType is struct
    VNForCastOper(float) is $43
    VNForCast($280, $43) returns $2c0 {Cast($280, $43)}
    Cast to float inserted in VNApplySelectorsAssignTypeCoerce (elemTyp is simd16)
    VNForMapStore($81, $1c0, $2c0):struct in BB01 returns $300 {$81[$1c0 := $2c0]}
  fgCurMemoryVN[GcHeap] assigned for StoreField at [000005] to VN: $300.
N006 [000005]   ASG       => $VN.Void

The above example is quite puzzling in general.

One more example, this time things are, arguably, actually "broken":

private static int Problem(ClassWithTwoFields cwtf)
{
    cwtf.Second = 2;

    IL.Emit.Ldarg(0);
    IL.Emit.Ldflda(new(typeof(ClassWithTwoFields), "First"));
    IL.Emit.Ldc_I8(1);
    IL.Emit.Stind_I8();

    return cwtf.Second;
}

class ClassWithTwoFields
{
    public int First;
    public int Second;
}
***** BB01, STMT00002(before)
N005 (  5,  5) [000012] ---XG-------              *  RETURN    int   
N004 (  4,  4) [000011] ---XG-------              \--*  IND       int   
N003 (  2,  2) [000018] -------N----                 \--*  ADD       byref 
N001 (  1,  1) [000010] ------------                    +--*  LCL_VAR   ref    V02 arg2         u:1 (last use)
N002 (  1,  1) [000017] ------------                    \--*  CNS_INT   long   12 field offset Fseq[Second]

N001 [000010]   LCL_VAR   V02 arg2         u:1 (last use) => $80 {InitVal($40)}
N002 [000017]   CNS_INT   12 field offset Fseq[Second] => $100 {LngCns:  12}
N003 [000018]   ADD       => $140 {ADD($80, $100)}
  VNApplySelectors:
    VNForHandle(Second) is $180, fieldType is int
      AX2: $180 != $181 ==> select([$203]store($201, $181, $240), $180) ==> select($201, $180) remaining budget is 99.
      AX1: select([$81]store($201, $180, $200), $180) ==> $200.
    VNForMapSelect($203, $180):int returns $200 {$1c0[$80 := $40]}
      AX1: select([$1c0]store($200, $80, $40), $80) ==> $40.
    VNForMapSelect($200, $80):int returns $40 {IntCns 2}
N004 [000011]   IND       => <l:$1c5 {norm=$40 {IntCns 2}, exc=$280 {NullPtrExc($80)}}, c:$1c4 {norm=$c1 {MemOpaque:NotInLoop}, exc=$280 {NullPtrExc($80)}}>
N005 [000012]   RETURN    => $c2 {MemOpaque:NotInLoop}

When we "give up" in VNApplySelectorsAssign, that is only for the field being stored, but in all likelihood we should propagate the "impropriety" of the store up the VN tree and end up with an opaque VN for the heap as a whole.

SingleAccretion commented 3 years ago

Another case of type mismatch, the infamous "promoted struct with one field":

N011 ( 22, 16) [000008] -ACXG---R---              *  ASG       struct (copy) $VN.Void
N010 (  3,  2) [000006] D------N----              +--*  LCL_VAR   struct<System.Collections.Immutable.ImmutableArray`1[[Microsoft.CodeAnalysis.CSharp.Symbol, Microsoft.CodeAnalysis.CSharp, Version=2.10.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], 8>(P) V08 loc0         d:2
                                                  +--*    ref    V08.array (offs=0x00) -> V37 tmp25
N009 ( 18, 13) [000004] --CXG-------              \--*  CALL      struct Microsoft.CodeAnalysis.CSharp.Binder.GetCandidateMembers $1c0
N005 (  1,  1) [000000] ------------ arg0 in rcx     +--*  LCL_VAR   ref    V01 arg1         u:1 (last use) $81
N006 (  1,  1) [000001] ------------ arg1 in rdx     +--*  LCL_VAR   ref    V02 arg2         u:1 (last use) $82
N007 (  1,  1) [000002] ------------ arg2 in r8      +--*  LCL_VAR   int    V04 arg4         u:1 $c1
N008 (  1,  1) [000003] ------------ arg3 in r9      \--*  LCL_VAR   ref    V05 arg5         u:1 $83

Here the field, ref V08, gets a VN of TYP_STRUCT...

Edit: fixed in #64898.

SingleAccretion commented 3 years ago

Yet another example of a mismatch - reading from static struct fields:

***** BB01, STMT00015(before)
N009 ( 26, 19) [000096] -ACXG---R---              *  ASG       ushort
N008 (  4,  3) [000094] D------N----              +--*  LCL_VAR   ushort V12 tmp8         d:1
N007 ( 21, 15) [000095] --CXG-------              \--*  IND       ushort // This used to be OBJ<struct>
N006 ( 18, 12) [000026] --CXG--N----                 \--*  ADD       byref
N004 ( 17, 11) [000024] --CXG-------                    +--*  IND       ref
N003 ( 15,  9) [000023] --CXG--N----                    |  \--*  ADD       byref
N001 ( 14,  5) [000021] H-CXG-------                    |     +--*  CALL help r2r_ind byref  HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE
N002 (  1,  4) [000022] ------------                    |     \--*  CNS_INT   int    656 Fseq[hackishFieldName]
N005 (  1,  1) [000025] ------------                    \--*  CNS_INT   long   8 Fseq[#FirstElem, _value]

Here the IND(ref) is a box that holds the struct, but we recognize it as the static field itself, leading to wasteful maps being generated and, of course, more casts:

    VNForHandle(hackishFieldName) is $201, fieldType is struct, size = 2
    VNForMapSelect($c0, $201):<UNDEF> returns $2c0 {$c0[$201]}
    VNForMapSelect($2c0, $240):struct returns $300 {$2c0[$240]}
    *** Mismatched types in VNApplySelectorsTypeCheck (reading beyond the end)
N004 [000024]   IND       => <l:$302 {norm=$340 {MemOpaque:NotInLoop}, exc=$1c3( {HelperMultipleExc()},  {NullPtrExc($281)})}, c:$1c4 {norm=$380 {MemOpaque:NotInLoop}, exc=$1c3( {HelperMultipleExc()},  {NullPtrExc($281)})}>

Not sure we can do much about this one, the pattern for the box is but identical to a pattern for a legitimate object static field and it is legal to read fields though any kind of indirection in our model...

SingleAccretion commented 3 years ago

Another unfortunate case of type mismatch:

N003 (  7,  5) [000625] -A------R--- >>>          *  ASG       struct (copy)
N002 (  3,  2) [000623] D------N----              +--*  LCL_VAR   struct<VoidTaskResult, 1> V54 tmp41        d:2
N001 (  3,  2) [000545] ------------              \--*  LCL_VAR   struct<VoidTaskResult, 1> V49 tmp36        u:2 (last use) $VN.ZeroMap

Here we would like to assign V54/2 the "zero" VN, but too get a cast because TypeOfVN(VNForZeroMap()) is TYP_REF. It looks like the "zero map" idea was never plumbed through, which is unfortunate, it seems we could gain some CQ with it or an equivalent construct.

Edit: fixed in #61285.

SingleAccretion commented 3 years ago

More type mismatches, this time for multireg args:

private static int Problem(LargeStruct valueArg, int c)
{
    var value = valueArg;

    if (c is 1)
    {
        return 1;
    }

    CallForMultiRegStruct(value.SmallerStruct1.MultiRegStruct1);

    return 0;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void CallForMultiRegStruct(MultiRegStruct value) { }

struct MultiRegStruct
{
    public long FirstValue;
    public long SecondValue;
}

struct LargeStruct
{
    public SmallerStruct SmallerStruct1;
    public SmallerStruct SmallerStruct2;
}

struct SmallerStruct
{
    public MultiRegStruct MultiRegStruct1;
    public MultiRegStruct MultiRegStruct2;
    public MultiRegStruct MultiRegStruct3;
}
***** BB03, STMT00002(before)
N011 ( 26, 17) [000013] --CXG-------              *  CALL      void   RyuJitReproduction.Program.CallForMultiRegStruct
N010 ( 12, 14) [000022] -c--G------- arg0 x0,x1   \--*  FIELD_LIST struct
N004 (  6,  7) [000023] n---G------- ofs 0           +--*  IND       long  
N003 (  3,  5) [000014] ----G-------                 |  \--*  ADDR      byref 
N002 (  3,  4) [000008] -------N----                 |     \--*  LCL_FLD   struct V02 loc0         u:2[+0] Fseq[SmallerStruct1, MultiRegStruct1]
N009 (  6,  7) [000028] n---G------- ofs 8           \--*  IND       long  
N008 (  5,  8) [000027] ----G--N----                    \--*  ADD       byref 
N006 (  3,  5) [000024] ----G-------                       +--*  ADDR      byref 
N005 (  3,  4) [000025] -------N----                       |  \--*  LCL_FLD   struct V02 loc0         u:2[+0] Fseq[SmallerStruct1, MultiRegStruct1] (last use)
N007 (  1,  2) [000026] ------------                       \--*  CNS_INT   long   8

N001 [000021]   ARGPLACE  => $145 {MemOpaque:NotInLoop}
  VNApplySelectors:
    VNForHandle(SmallerStruct1) is $101, fieldType is struct, size = 48
    VNForMapSelect($144, $101):struct returns $200 {$144[$101]}
  VNApplySelectors:
    VNForHandle(MultiRegStruct1) is $102, fieldType is struct, size = 16
    VNForMapSelect($200, $102):struct returns $201 {$200[$102]}
  VNApplySelectors:
    VNForHandle(SmallerStruct1) is $101, fieldType is struct, size = 48
    VNForMapSelect($144, $101):struct returns $200 {$144[$101]}
  VNApplySelectors:
    VNForHandle(MultiRegStruct1) is $102, fieldType is struct, size = 16
    VNForMapSelect($200, $102):struct returns $201 {$200[$102]}
N002 [000008]   LCL_FLD   V02 loc0         u:2[+0] Fseq[SmallerStruct1, MultiRegStruct1] => $201 {$200[$102]}
    FieldSeq {MultiRegStruct1} is $240
    FieldSeq {(SmallerStruct1, MultiRegStruct1)} is $241
N003 [000014]   ADDR      => $280 {PtrToLoc($43, $241)}
  VNApplySelectors:
    VNForHandle(SmallerStruct1) is $101, fieldType is struct, size = 48
    VNForMapSelect($144, $101):struct returns $200 {$144[$101]}
  VNApplySelectors:
    VNForHandle(MultiRegStruct1) is $102, fieldType is struct, size = 16
    VNForMapSelect($200, $102):struct returns $201 {$200[$102]}
  VNApplySelectors:
    VNForHandle(SmallerStruct1) is $101, fieldType is struct, size = 48
    VNForMapSelect($144, $101):struct returns $200 {$144[$101]}
  VNApplySelectors:
    VNForHandle(MultiRegStruct1) is $102, fieldType is struct, size = 16
    VNForMapSelect($200, $102):struct returns $201 {$200[$102]}
N004 [000023]   IND       => $2c0 {$201, long <- struct}
SingleAccretion commented 2 years ago

Some thoughts on how to address the problems with static fields.

There are many inter-related issues, they fall into 3 categories:

  1. The are a lot of representations for statics in IR:

    • IND(CNS_INT addr) - for something like *(ref s_field).
    • CLS_VAR - for things that get to be relocatable (always on 32 bit).
    • IND(HELPER + OFFSET).
    • All the above, but as a base address for a boxed static.
    • Windows x86 TLS statics.
    • RVA statics: IND(CNS_INT addr), works for structs as well. These can be overlapping.
    • Opaque helpers - uninteresting for our purposes as we do not attach sequences to them.
  2. Boxed statics do not fit well into the overall picture as the boxes themselves are treated by IsFieldAddr as fields.

  3. The selections resulting from this are confusing and too long.

From VN's perspective, however, all this can be collapsed into just two cases:

  1. Non-shared statics:
    heap[field][struct fields...]
  2. Shared statics:
    heap[field][instantiation][struct fields...]

Notably, the "shared" case is but identical to the case of instance fields:

heap[field][obj][struct fields...]

(Including the IR shape of FIELD_ADDR - BASE_ADDR [Zero FldSeq] / ADD(BASE_ADDR, [FldSeq])

Notably, today's representation does not carry information about a static's shared-ness, all statics that use helpers as well as boxed ones are counted as shared. That will need to change. Shared-ness can be derived, via a new Jit-EE interface method, from the field handle, though for prototyping purposes it will be much easier to have the Jit save it during importation instead (allowing for testing with SPMI).

Now, onto the fields sequences, since they cause problems for boxed statics. Today, we have the field sequence for the field attached to the inner address tree (one for the box), and add a pseudo-field in the form of #FirstElem to the outer address tree. That does not seem helpful, since the inner address is usually not all that useful to numbering (it is a simple invariant indirection that is only ever read from). For the simple struct static case, we can, in fact, just ignore the inner address. For shared statics, however, we must take it into account...

The outer address will canonically look like this: ADD(IND(HELPER + OFFSET), CNS_INT TARGET_POINTER_SIZE). We would like to use it "as a whole", instead of searching for the inner helper like we do today.

HELPER is a function of the instantiation. HELPER + OFFSET is a function of the instantiation and the field. For a given field, the offset will not vary, so, for a given field, this once again only depends the instantiation. The indirection in question (will be) invariant, so it too only depends the instantiation. Finally, the TARGET_POINTER_SIZE offset is a constant one.

What this all gives us in the end is a very simple rule: for a shared static, we can use the base address' VN as the instantiation argument - just like we use the object reference for instance fields, and this holds for both the simple and boxed cases. This has one more virtue: we will be "protected" by design from CSE (in JitOptRepeat compilations) reshaping the inner address trees and messing up the pattern matching.

Accordingly, we will attach the field sequence for the static (a singleton) only to the outer address, and only look at that in IsFieldAddr.

Edit: will be fixed in #66558.

SingleAccretion commented 2 years ago

One of the more obscure type mismatches arise with ByReference<T> local fields:

N004 (  3,  4) [000029] ------------                 |        +--*  LCL_FLD   byref  V01 loc0         u:2[+0] Fseq[_pointer]

N004 [000029]   LCL_FLD   V01 loc0         u:2[+0] Fseq[_pointer] => $2c0 {$280, byref <- struct}
SingleAccretion commented 2 years ago

Block morphing attaches wrong field sequences when assigning a promoted struct to an indirection:

1) Each address gets a zero-offset sequence associated with the first field, which is not correct:

[000265] -A-X-+------              *  COMMA     void
[000258] -A-X--------              +--*  ASG       byref
[000256] *--X---N----              |  +--*  IND       byref
[000117] -----+------              |  |  \--*  LCL_VAR   byref  V01 RetBuf        Zero Fseq[_pointer]
[000257] ------------              |  \--*  LCL_VAR   byref  V23 tmp21
[000264] -A-X--------              \--*  ASG       int
[000262] *--X---N----                 +--*  IND       int
[000261] ------------                 |  \--*  ADD       byref
[000259] -----+------                 |     +--*  LCL_VAR   byref  V01 RetBuf        Zero Fseq[_pointer] // <- should not be there
[000260] ------------                 |     \--*  CNS_INT   long   8 Fseq[_length]
[000263] ------------                 \--*  LCL_VAR   int    V24 tmp22

The cause is that for the first address, the original expression is used, and a zero-offset sequence it attached to it, correctly, but then subsequent gtCloneExpr(originalAddress) calls also pick up this sequence.

Edit: fixed in #62687.

2) The sequences of the source are used, while mismatched struct types are allowed as long as the layout is the same. This means that for the code like below:

[MethodImpl(MethodImplOptions.NoInlining)]
private static long Problem(AnotherMultiRegStruct arg0, MultiRegStruct[] arg1)
{
    IL.Emit.Ldarg(1);
    IL.Emit.Ldc_I4(0);
    IL.Emit.Ldelema<MultiRegStruct>();
    IL.Emit.Ldarga(0);
    IL.Emit.Ldobj<MultiRegStruct>();
    IL.Emit.Stobj<MultiRegStruct>();

    IL.Emit.Ldarg(1);
    IL.Emit.Ldc_I4(0);
    IL.Emit.Ldelema<MultiRegStruct>();
    IL.Emit.Ldfld(new(typeof(MultiRegStruct), "FirstLngValue"));
    return IL.Return<long>();
}

struct MultiRegStruct
{
    public long FirstLngValue;
    public long SecondLngValue;
}

struct AnotherMultiRegStruct
{
    public long AnotherFirstLngValue;
    public long AnotherSecondLngValue;
}

VN will not see the mutations to the MultiRegStruct[] made through AnotherMultiRegStruct's fields. Block morphing, in all likelihood, should be using the destination's field handles. Or just give up if it sees mismatched handles.

SingleAccretion commented 2 years ago

One more kind of mismatch, this time we lose a zero-offset sequence in morph:

private static void Problem(StructWithByte[] a)
{
    var s = MemoryMarshal.CreateSpan(ref a[0], 1);

    s[0].Byte = 1;
}
***** BB01, STMT00002(before)
N008 ( 12, 14) [000022] -A-XG-------              *  ASG       ubyte
N006 ( 10, 12) [000019] ---XG--N----              +--*  COMMA     ubyte
N003 (  6,  9) [000016] ---X--------              |  +--*  ARR_BOUNDS_CHECK_Rng void
N001 (  1,  1) [000011] ------------              |  |  +--*  CNS_INT   int    0
N002 (  1,  1) [000015] ------------              |  |  \--*  CNS_INT   int    1
N005 (  4,  3) [000082] *--X---N----              |  \--*  IND       ubyte
N004 (  1,  1) [000017] ------------              |     \--*  LCL_VAR   byref  V03 tmp1         u:2 (last use)
N007 (  1,  1) [000020] ------------              \--*  CNS_INT   int    1

N001 [000011]   CNS_INT   0 => $40 {IntCns 0}
N002 [000015]   CNS_INT   1 => $42 {IntCns 1}
N003 [000016]   ARR_BOUNDS_CHECK_Rng => $188 {norm=$2 {2}, exc=$187 {IndexOutOfRangeExc($40, $42)}}
N004 [000017]   LCL_VAR   V03 tmp1         u:2 (last use) => $2c0 {PtrToArrElem($280, $80, $201, $0)}
N006 [000019]   COMMA     => $188 {norm=$2 {2}, exc=$187 {IndexOutOfRangeExc($40, $42)}}
N007 [000020]   CNS_INT   1 => $42 {IntCns 1}
Tree [000022] assigns to an array element:
    VNForMapSelect($100, $280):mem returns $300 {$100[$280]}
    VNForMapSelect($300, $80):mem returns $301 {$300[$80]}
    VNForMapSelect($301, $201):struct returns $340 {$301[$201]}
    *** Mismatched types in fgValueNumberArrIndexAssign
  hAtArrType $300 is MapSelect(curGcHeap($100), StructWithByte[]).
  hAtArrTypeAtArr $301 is MapSelect(hAtArrType($300), arr=$80)
  hAtArrTypeAtArrAtInx $340 is MapSelect(hAtArrTypeAtArr($301), inx=$201):struct
  newValAtArrType $440 is  {MemOpaque:NotInLoop}
    VNForMapStore($100, $280, $440):heap in BB01 returns $480 {$100[$280 := $440]}
  fgCurMemoryVN[GcHeap] assigned for ArrIndexAssign (case 1) at [000022] to VN: $480.
N008 [000022]   ASG       => $VN.Void

Edit: this not actually a bug - it is by design that we will lose field sequences in this situation as we have an ADD of zero that has no sequence attached to it (coming from span index import). Span indexing does not end up "labeling" its constituent parts like its address counterpart. In other words, span indexing is equivalent to using raw ref arithmetic. A little unfortunate the compiler creates this case itself, but there's not a lot that can be done about it.

SingleAccretion commented 2 years ago

Some more (a little bit random) notes on ByReference<T> LCL_FLDs and related things:

1) Adding zero-offset sequences to them is broken, as they will always be added to the LCL_FLD nodes itself, while logically representing an extension of a pointer that the node evaluates to. Not reproducible (may be reproducible under stress with selective promotion) right now because of a very fragile pessimization that exists when assigning a struct field in another struct.

private static void Problem(StructWithByte[] a)
{
    var s = MemoryMarshal.CreateSpan(ref a[0], 1);

    s[0].Byte = 1;
}
***** BB01, STMT00009(before)
N005 ( 13, 12) [000056] -A------R---              *  ASG       struct (copy)
N004 (  9,  9) [000055] n------N----              +--*  OBJ       struct<System.ByReference`1[StructWithByte], 8>
N003 (  3,  5) [000054] ------------              |  \--*  ADDR      byref 
N002 (  3,  4) [000047] U------N----              |     \--*  LCL_FLD   struct V04 tmp2         ud:2->3[+0] Fseq[_pointer]
N001 (  3,  2) [000052] -------N----              \--*  LCL_VAR   struct<System.ByReference`1[StructWithByte], 8> V05 tmp3         u:2 (last use)

N001 [000052]   LCL_VAR   V05 tmp3         u:2 (last use) => $300 {PtrToArrElem($c2, $80, $281, $0)}
  VNApplySelectors:
    VNForHandle(_pointer) is $c3, fieldType is struct, size = 8
    VNForMapSelect($100, $c3):struct returns $101 {ZeroObj($c1: System.ByReference`1[StructWithByte])}
  VNApplySelectors:
    VNForHandle(_pointer) is $c3, fieldType is struct, size = 8
    VNForMapSelect($100, $c3):struct returns $101 {ZeroObj($c1: System.ByReference`1[StructWithByte])}
N002 [000047]   LCL_FLD   V04 tmp2         ud:2->3[+0] Fseq[_pointer] => $101 {ZeroObj($c1: System.ByReference`1[StructWithByte])}
    FieldSeq {_pointer} is $206
N003 [000054]   ADDR      => $2c2 {PtrToLoc($42, $206)}
    *** Mismatched types in VNApplySelectorsTypeCheck (indType is TYP_STRUCT)
    *** Mismatched types in VNApplySelectorsTypeCheck (indType is TYP_STRUCT)
Tree [000056] assigned VN to local var V04/3: new uniq $104 {MemOpaque:NotInLoop}
N005 [000056]   ASG       => $VN.Void

(JitNoStructPromotion=1 is of course required)

Edit: fixed in #64501.

2) The usage of Unsafe.As, the object overload in the Span's .ctor for an array makes it impossible to add a VNF_PtrToClassField-like VNFunc (very similar to the existing VNF_PtrToLoc, VNF_PtrToArrElem and VNF_PtrToStatic mechanisms) because we would then be able to deduce a VNF_PtrToClassField for the _pointer span field in below code:

private static void Problem(StructWithByte[] a)
{
    var s = a.AsSpan();

    s[0].Byte = 1;
}
***** BB03, STMT00008(before)
N005 (  7,  6) [000048] -A--G---R---              *  ASG       byref 
N004 (  3,  2) [000047] D------N----              +--*  LCL_VAR   byref  V09 tmp7         d:2
N003 (  3,  3) [000086] ------------              \--*  ADD       byref 
N001 (  1,  1) [000084] ------------                 +--*  LCL_VAR   ref    V00 arg0         u:1
N002 (  1,  1) [000085] ------------                 \--*  CNS_INT   long   16 field offset Fseq[Data]

Then accessing it both as an array and as a span will be modeled as non-aliasing, which is of course incorrect.

Notably this bug already exists in a more restricted form today:

private static int Problem(StructWithByte[] a)
{
    MemoryMarshal.GetArrayDataReference(a).Byte = 1;

    a[0].Byte = 2;

    return MemoryMarshal.GetArrayDataReference(a).Byte;
}
***** BB01, STMT00004(before)
N005 (  6,  6) [000017] ---XG-------              *  RETURN    int   
N004 (  5,  5) [000016] *--XG-------              \--*  IND       ubyte 
N003 (  2,  2) [000037] -------N----                 \--*  ADD       byref 
N001 (  1,  1) [000013] ------------                    +--*  LCL_VAR   ref    V00 arg0         u:1 (last use)
N002 (  1,  1) [000036] ------------                    \--*  CNS_INT   long   16 field offset Fseq[Data, Byte]

N001 [000013]   LCL_VAR   V00 arg0         u:1 (last use) => $80 {InitVal($40)}
N002 [000036]   CNS_INT   16 field offset Fseq[Data, Byte] => $100 {LngCns:  16}
N003 [000037]   ADD       => $140 {ADD($80, $100)}
    VNForHandle(Data) is $180, fieldType is ubyte
      AX2: $180 != $182 ==> select([$2c1]store($2c0, $182, $282), $180) ==> select($2c0, $180) remaining budget is 99.
      AX1: select([$c0]store($2c0, $180, $280), $180) ==> $280.
    VNForMapSelect($2c1, $180):mem returns $280 {$1c0[$80 := $240]}
      AX1: select([$1c0]store($280, $80, $240), $80) ==> $240.
    VNForMapSelect($280, $80):ubyte returns $240 {$200[$181 := $42]}
  VNApplySelectors:
    VNForHandle(Byte) is $181, fieldType is ubyte
      AX1: select([$200]store($240, $181, $42), $181) ==> $42.
    VNForMapSelect($240, $181):ubyte returns $42 {IntCns 1}
N004 [000016]   IND       => <l:$381 {norm=$42 {IntCns 1}, exc=$340 {NullPtrExc($80)}}, c:$201 {norm=$4c0 {MemOpaque:NotInLoop}, exc=$340 {NullPtrExc($80)}}>
N005 [000017]   RETURN    => $301 {MemOpaque:NotInLoop}

(Notice how we're not even aliasing anything here - we turn StructWithByte[] into a ref StructWithByte which is perfectly ok)

Full bad codegen reproduction ```cs Console.WriteLine(Problem(new byte[] { 1 })); static int Problem(byte[] a) { if (MemoryMarshal.GetArrayDataReference(a) == 1) { a[0] = 2; if (MemoryMarshal.GetArrayDataReference(a) == 1) { return -1; } } return 0; } ```
SingleAccretion commented 2 years ago

Next steps for this issue are:

1) Completely delete CLS_VAR - #66298. 2) Store whether the field needs "the base address" in the static handle itself (at import time). Of course, will require abstracting away the handle - #66558.

SingleAccretion commented 2 years ago

With physical VN, much of this issue has been obsoleted.

Still, some work remains to enable the debug checker and delete assertion propagation code, as well as fix GetArrayDataReference.

Don't think it is 7.0 worthy at this point though.

MichalPetryka commented 1 year ago

GetArrayDataReference is fixed now.