dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.43k stars 200 forks source link

[NativeAOT-LLVM] Optimisation on compiling problem #1859

Closed yowl closed 11 months ago

yowl commented 2 years ago

Looking at what is wrong with Release config builds and why Optimisation was removed in #1801, this is the first problem.
We have

bool Llvm::isLlvmFrameLocal(LclVarDsc* varDsc)
{
    assert(canStoreLocalOnLlvmStack(varDsc) && (_compiler->fgSsaPassesCompleted >= 1));
    return !varDsc->lvInSsa;
}

But when compiling with optimisation, you can get methods with LIR:

------------ BB01 [000..008) (return), preds={} succs={}
N001 (  0,  0) [000002] ------------                 RETURN    void   $c0

In this case the shadow stack arg is not tracked as there are no references, lvInSsa is then false, and isLlvmFrameLocal returns true. Should this method include varDsc->lvTracked or lvRefCnt() != 0 or lclNum != _shadowStackLclNum ?

cc @SingleAccretion

SingleAccretion commented 2 years ago

Depends on what the actual problem is.

If it's that we're creating unused allocas (which I would guess is what is happening), the place that is creating them should be checking lvRefCnt, yes (it is what ordinary codegen does).

yowl commented 2 years ago

In createAllocasForLocalsWithAddrOp it then goes on to hit the assert :

            if (varDsc->lvIsParam)
            {
                LlvmArgInfo argInfo = getLlvmArgInfoForArgIx(lclNum);
                assert(argInfo.IsLlvmArg())
SingleAccretion commented 2 years ago

It does not seem right that getLlvmArgInfoForArgIx effectively fails if the shadow stack local is not referenced.

yowl commented 2 years ago

The next assert after this one is https://github.com/dotnet/runtimelab/blob/1960cfa58f3aa2ab46355947c19b7cfd33dbe60b/src/coreclr/jit/importer.cpp#L17454 Is this related to tail calls? Tail calls are at phase 3 for WebAssembly, so would be good not to do anything that makes enabling them more difficult.

SingleAccretion commented 2 years ago

That looks like some importer code being all too aware of ABI details to me, yikes...

LLVM should be taking the "no return buffer path" there. I think you can add TARGET_WASM to the #elif defined(TARGET_X86) (in general our "virtual" (before lower) ABI is very similar to x86).

^ Above not true, let me look again.

Edit: well, we could add the #ifdef to x86, or any other cases where the struct is treated as by-value. But we should make sure that retTypeDesc.GetReturnRegCount() is 1.

yowl commented 2 years ago

It does not seem right that getLlvmArgInfoForArgIx effectively fails if the shadow stack local is not referenced.

True, it could contain:

    if (lclNum == _shadowStackLclNum)
    {
        llvmArgInfo.m_argIx = 0;
        llvmArgInfo.m_shadowStackOffset = 0;
        return llvmArgInfo;
    }

Which would stop it asserting, but it would then create an alloca. We never want an alloca for the shadow stack do we?

SingleAccretion commented 2 years ago

Yes. So the right fix is to add both the ref count check and the shadow stack code to getLlvmArgInfoForArgIx.

yowl commented 2 years ago

That looks like some importer code being all too aware of ABI details to me, yikes...

~LLVM should be taking the "no return buffer path" there. I think you can add TARGET_WASM to the #elif defined(TARGET_X86) (in general our "virtual" (before lower) ABI is very similar to x86).~

^ Above not true, let me look again.

Edit: well, we could add the #ifdef to x86, or any other cases where the struct is treated as by-value. But we should make sure that retTypeDesc.GetReturnRegCount() is 1.

Wasm has multivalue return values, https://github.com/WebAssembly/multi-value, is that relevant here?

SingleAccretion commented 2 years ago

I don't think, no. We're constrained by .NET semantics here, and that only allows one return value.

Perhaps we could use it in the future for some intrinsic-like functions/types, but even then, we'd still need to represent it as one "tuple" in the front-end of the compiler.

(Does LLVM expose this multi-value functionality at all?)

yowl commented 2 years ago

(Does LLVM expose this multi-value functionality at all?)

I think so , but apparently we just need to return structs and clang does the rest https://twitter.com/tlively52/status/1229944658354032640

yowl commented 2 years ago

Edit: well, we could add the #ifdef to x86, or any other cases where the struct is treated as by-value. But we should make sure that retTypeDesc.GetReturnRegCount() is 1.

The struct on the method I'm looking at

Invoking compiler for the inlinee method System.Span`1:Slice(int):System.Span`1[System.Char]:this :
IL to import:
IL_0000  00                nop
IL_0001  03                ldarg.1
IL_0002  02                ldarg.0
IL_0003  7b 87 01 00 0a    ldfld        0xA000187
IL_0008  fe 03             cgt.un
IL_000a  0a                stloc.0
IL_000b  06                ldloc.0
IL_000c  2c 06             brfalse.s    6 (IL_0014)
IL_000e  28 a7 15 00 06    call         0x60015A7
IL_0013  00                nop
IL_0014  02                ldarg.0
IL_0015  7c 86 01 00 0a    ldflda       0xA000186
IL_001a  28 81 01 00 0a    call         0xA000181
IL_001f  03                ldarg.1
IL_0020  e0                conv.u
IL_0021  28 3b 01 00 2b    call         0x2B00013B
IL_0026  02                ldarg.0
IL_0027  7b 87 01 00 0a    ldfld        0xA000187
IL_002c  03                ldarg.1
IL_002d  59                sub
IL_002e  73 2e 01 00 0a    newobj       0xA00012E
IL_0033  0b                stloc.1
IL_0034  2b 00             br.s         0 (IL_0036)
IL_0036  07                ldloc.1
IL_0037  2a                ret

Is a Span<Char>. The problem I'm having is that because the type appears as TYP_UNKNOWN here I have retTypeDesc.GetReturnRegCount() == 0 so it goes to the else anyway. Maybe I should be investigate why its TYP_UNKNOWN

SingleAccretion commented 2 years ago

Yes, that does not seem right. It should be TYP_STRUCT.

yowl commented 2 years ago

Well, that's not too complicated. InitializeStructReturnType is basically a nop when FEATURE_MULTIREG_RET is not defined.

yowl commented 2 years ago

Maybe I can turn on FEATURE_MULTIREG_RET and for LLVM return SPK_ByValue in a single "register" or getReturnTypeForStruct is SPK_PrimitiveType where for other targest it is SPK_ByValueAsHfa or SPK_ByValue. We already set it to SPK_ByValue so I'll go with that.

yowl commented 2 years ago

What is V04 in here?

------------ BB01 [000..010) (return), preds={} succs={}
N004 (  3,  2) [000001] -------N----         t1 =    LCL_VAR_ADDR byref  V01 arg1
                                                  *    long   V01._ticks (offs=0x00) -> V04 tmp1
                                                  /--*  t1     byref
N006 (  9,  7) [000005] n-----------         t5 = *  OBJ       struct<System.TimeSpan, 8> <l:$102, c:$103>
N007 (  1,  1) [000002] ------------         t2 =    CNS_INT   int    62 $44
               [000022] ------------        t22 =    LCL_VAR   int    V07 rat0         u:1
               [000023] ------------        t23 =    CNS_INT   int    12
                                                  /--*  t22    int
                                                  +--*  t23    int
               [000024] ------------        t24 = *  ADD       int
                                                  /--*  t24    int
               [000025] ------------        t25 = *  PUTARG_TYPE int
                                                  /--*  t5     struct
               [000026] ------------        t26 = *  PUTARG_TYPE struct
                                                  /--*  t2     int
               [000027] ------------        t27 = *  PUTARG_TYPE int
                                                  /--*  t25    int    arg-1
                                                  +--*  t26    struct arg0
                                                  +--*  t27    int    arg1
N008 ( 24, 12) [000003] --CXG-------         t3 = *  CALL      int    System.Threading.Tasks.Task.ValidateTimeout $141

I think the method is the bool overload (S_P_CoreLib_System_Threading_Tasks_Task_1<Bool>__WaitAsync_1):

        public new Task<TResult> WaitAsync(TimeSpan timeout) =>
            WaitAsync(ValidateTimeout(timeout, ExceptionArgument.timeout), default);

V04 has lvIsParam == true and looks like this in the debugger

▶ | varDsc | 0x000001ff432194e0 [V4: TYP_LONG-"field V01._ticks (fldOffset=0x0)"] | LclVarDsc *

and looks like some kind of optimisation, pulling out the only field _ticks in the TimeSpan ? The problem is that because it is a parameter we expect to find it as an LLVM arg but its not there. _sigInfo.numArgs == 2, so it never gets a lvLlvmArgNum

SingleAccretion commented 2 years ago

and looks like some kind of optimisation

Yep, it is a promoted struct field.

Promotion "disassembles" structs into constituent fields and the parent local can end in one of two states:

1) Completely eliminated from the IR (independent promotion). 2) Not completely eliminated (dependent promotion).

From the backend perspective, we should treat independently promoted fields as ordinary locals, and dependently promoted ones as being modified with stores to their fields (the field LCL_VARs become "effective" LCL_FLDs). I think we should undo dependent promotion in lowering and replace all of the fields with actual LCL_FLDs, for simplicity.

The backend must also ensure that fields from arguments get "homed" into their locations properly. For independent promotion, I propose we lower them into the following:

;; Prolog block
var fieldOne = arg.FieldOne; (LCL_FLD) // Clear lvIsParam, now it is an ordinary local
var fieldTwo = arg.FieldTwo; (LCL_FLD) // Clear lvIsParam, now it is an ordinary local
< ... etc ... >

That entails supporting LCL_FLD in codegen. For SSA locals that means extractvalue I'd imagine, for isLlvmFrameLocal - loads.

yowl commented 2 years ago

I think extractvalue is what I did in the IL module, but it wont work for unions as the parameter is an element index. So for Register we only have 2 fields: 2 i64s . If you want the second byte, you can't do it from an extractvalue. Its easier to bitcast and gep I think.

Edit. that wont work either though, I'll see what clang does for C

Ugh, this looks hard:

struct bytes
{
    char b1;
    char b2;
    char b3;
};

union u
{
    long l;
    struct bytes someBytes;
};

void access(union u passedUnion)
{
    long l2 = passedUnion.l;
    char c = passedUnion.someBytes.b2;
}

int main()
{
    union u unionVar;
    unionVar.l = 123;

    access(unionVar);

    return 0;
}

LLVM:

%union.u = type { i32 }
%struct.bytes = type { i8, i8, i8 }

; Function Attrs: noinline nounwind optnone
define void @access(%union.u* byval(%union.u) align 4 %0) #0 {
  %2 = alloca i32, align 4
  %3 = alloca i8, align 1
  %4 = bitcast %union.u* %0 to i32*
  %5 = load i32, i32* %4, align 4
  store i32 %5, i32* %2, align 4
  %6 = bitcast %union.u* %0 to %struct.bytes*
  %7 = getelementptr inbounds %struct.bytes, %struct.bytes* %6, i32 0, i32 1
  %8 = load i8, i8* %7, align 1
  store i8 %8, i8* %3, align 1
  ret void
}

; Function Attrs: noinline nounwind optnone
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca %union.u, align 4
  store i32 0, i32* %1, align 4
  %3 = bitcast %union.u* %2 to i32*
  store i32 123, i32* %3, align 4
  call void @access(%union.u* byval(%union.u) align 4 %2)
  ret i32 0
}

In C it's easier as you know all the different unions up front, but for C# explicit layouts.... I wish I hadn't looked at this now :-)

Or maybe the original idea of bitcasts and geps would be ok. We dont do this parameter passing though %union.u* byval(%union.u)

SingleAccretion commented 2 years ago

Or maybe the original idea of bitcasts and geps would be ok

I would imagine it should work if we spill LCL_FLD-accessed locals to allocas: get the address, bitcast to i8*, add the offset, bitcast to <LCL_FLD type>*, then load.

Not as sophisticated a scheme as what Clang does, but can always be tuned better later (say, don't spill if only proper fields are acessed).

yowl commented 2 years ago

That makes sense. Do you see this as lowering or codegen?

SingleAccretion commented 2 years ago

In codegen, like other isLlvmFrameLocals.

yowl commented 2 years ago

If I look at this IR

               [000323] ------------                 IL_OFFSET void   IL offset: 0x0
N001 (  1,  1) [000007] ------------         t7 =    CNS_INT   int    0 $41
                                                  /--*  t7     int
N003 (  5,  4) [000008] DA----------              *  STORE_LCL_VAR struct<System.DateTimeOffset, 16> V00 tmp0
N001 ( 14,  2) [000002] H-CXG-------         t2 =    CALL help byref  HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE Zero Fseq[Zero, _ticks] $200
               [000328] ------------       t328 =    LCL_VAR   int    V35 rat0         u:1
               [000327] ------------       t327 =    CNS_INT   int    0
                                                  /--*  t328   int
                                                  +--*  t327   int
               [000329] ------------       t329 = *  ADD       int
                                                  /--*  t329   int
                                                  +--*  t2     byref
N003 ( 14,  3) [000312] -A-XG-------              *  STOREIND  byref
               [000331] ------------       t331 =    LCL_VAR   int    V35 rat0         u:1
               [000330] ------------       t330 =    CNS_INT   int    0
                                                  /--*  t331   int
                                                  +--*  t330   int
               [000332] ------------       t332 = *  ADD       int
                                                  /--*  t332   int
N004 (  1,  1) [000313] ------------       t313 = *  IND       byref  Zero Fseq[Zero, _ticks]
                                                  /--*  t313   byref
N006 ( 18,  6) [000261] D--XG-------       t261 = *  IND       long   <l:$284, c:$283>
                                                  /--*  t261   long
N008 ( 18,  6) [000262] DA-XG-------              *  STORE_LCL_VAR long   V21 tmp21        d:1
N002 (  3,  2) [000058] -------N----        t58 =    LCL_VAR_ADDR byref  V03 tmp3
                                                  *    long   V03._ticks (offs=0x00) -> V21 tmp21
                                                  /--*  t58    byref
N004 (  9,  7) [000061] n-----------        t61 = *  OBJ       struct<System.TimeSpan, 8> <l:$c3, c:$c4>

And the promoted struct V03 I can see that for node 58:

N002 (  3,  2) [000058] -------N----        t58 =    LCL_VAR_ADDR byref  V03 tmp3
                                                  *    long   V03._ticks (offs=0x00) -> V21 tmp21

It is marked as promoted and has some fields in its LclVarDsc that point back to the parent image But in this case, I guess I have 2 questions, is this dependent or independent? I can see that in node 61 OBJ it is using the struct, so this is dependent? However V21 is handled in llvm.cpp as a local and mapped into _localsMap using it's SSA pair, there is no alloca for it.

Is the process therefore

  1. Convert node 58 to a LCL_FLD_ADDR
  2. Convert V21 to an alloca because it now fits the criteria in https://github.com/dotnet/runtimelab/issues/1859#issuecomment-1061709865

I think that's what I understand by the comments above. Apologies if I've missed something obvious.

SingleAccretion commented 2 years ago

is this dependent or independent

Dependent. Independent promotion would have no references to the parent in the IR.

There is a method on Compiler, lvaGetPromotionType that distinguishes between the two by the present of lvDoNotEnregister.

Convert node [000058] to a LCL_FLD_ADDR

I don't think that's necessary. Dependent promotion in our case would imply that we will only reference the parent, and [000058] is already doing just that.

Convert V21 to an alloca because it now fits the criteria in

Yes, the idea is to turn field locals referencing dependently promoted parents "back" into LCL_FLDs, which then implies that the parent locals would live on the LLVM frame (for now, at least).

yowl commented 2 years ago

I'm afraid I'm still a bit stuck. With the above IR, for the LCL_VAR_ADDR

N002 (  3,  2) [000058] -------N----        t58 =    LCL_VAR_ADDR byref  V03 tmp3
                                                  *    long   V03._ticks (offs=0x00) -> V21 tmp21

If I look at V03, the struct, 0x000001df43f5c040 [V3: TYP_STRUCT-"Inlining Arg"] and its promotion type I get PROMOTION_TYPE_INDEPENDENT because lvDoNotEnregister == 0 For the field, V21 , 0x0000029762236710 [V21: TYP_LONG-"field V03._ticks (fldOffset=0x0)"] it is PROMOTION_TYPE_NONE

I find the setting/nomenclature of lvParentLcl a bit confusing because the "parent" of V03 , the struct, is 21, the field?

Also as this IR wants the address of the field, what should I be putting back to a LCL_FLD, as I want the address not the field value?

:-/

SingleAccretion commented 2 years ago

If I look at V03, the struct, 0x000001df43f5c040 [V3: TYP_STRUCT-"Inlining Arg"] and its promotion type I get PROMOTION_TYPE_INDEPENDENT because lvDoNotEnregister == 0

It looks like a bug. The frontend is supposed to mark all dependently promoted structs with lvDoNotEnregister (SSA breaks otherwise), except a few special cases (CanBeReplacedWithItsField structs and multi-reg cases).

Could you share the full dump?

I find the setting/nomenclature of lvParentLcl a bit confusing because the "parent" of V03 , the struct, is 21, the field?

That ought not to be the case, lvParentLcl for V21 should be V03 (the struct), for V03 itself - BAD_VAR_NUM (we don't support recursive promotion).

Also as this IR wants the address of the field, what should I be putting back to a LCL_FLD, as I want the address not the field value?

For the address node that refers to the parent, it should be left alone. Address nodes that refer to field locals should be morphed into LCL_FLD_ADDR.

yowl commented 2 years ago

1.txt

Thanks, this is the dump up to the LLVM phase

yowl commented 2 years ago

It looks like a bug.

Maybe because it is the result of inlining?

double checking I'm looking at the right thing: image

SingleAccretion commented 2 years ago

Ok, I see what is going on. This is args morphing's fault more or less, it doesn't mark the parent local DNER because it doesn't try to retype it (due to the "LLVM ABI" that we have).

There are two ways to solve this problem: 1) Pessimistic but simple: do a lowering pre-pass for LLVM identifying all dependently promoted locals "manually". This would be pretty impactful CQ-wise. 2) Support decomposition of promoted locals with FIELD_LISTs, similar to how x86 and other targets (Unix x64, ARM64, ARM) with stack-based args operate. Not sure how we'd map this to LLVM (build it up with with a series of insertvalues...?).

For reference, a FIELD_LIST represents a collection of values (most often promoted fields, but they can be any values in principle):

FIELD_LIST struct
  LCL_VAR V04 <promoted field at offset 0>
  LCL_VAR V05 <promoted field at offset 4>
  LCL_VAR V06 <promoted field at offset 8>

There is x86 code which does what we need in args morphing if we decide to go this route:

https://github.com/dotnet/runtimelab/blob/d6e61cf66532f2741f26123bbe5228dfcd33449d/src/coreclr/jit/morph.cpp#L4215-L4253

yowl commented 2 years ago

Sounds like the feeling is that option 2 is best. Are you saying we can just || defined (TARGET_WASM) this block? I did try that, and the FIELD_LIST looks like is it create fine, however at line ,

https://github.com/dotnet/runtimelab/blob/a6b3df7414af3c3125c4fe8cc84ef708c69a45c2/src/coreclr/jit/morph.cpp#L4326

it evaluates true, reMorphing == false, call->fgArgInfo->HasRegArgs() == true and in EvalArgsToTemps which asserts at

https://github.com/dotnet/runtimelab/blob/a6b3df7414af3c3125c4fe8cc84ef708c69a45c2/src/coreclr/jit/morph.cpp#L2375

I'm not helping much at the moment, but happy to be a code monkey for this if that's ok.

SingleAccretion commented 2 years ago

Sounds like the feeling is that option 2 is best.

It is certainly better CQ-wise.

I did try that, and the FIELD_LIST looks like is it create fine, however, there are problems

We should be taking the x86 path here:

https://github.com/dotnet/runtimelab/blob/a6b3df7414af3c3125c4fe8cc84ef708c69a45c2/src/coreclr/jit/morph.cpp#L2338-L2351

I. e. all of our args should be "on stack". It will take some tweaking of fgInitArgInfo to steer it in the right direction. Wonder if just always returning false from isRegParamType under LLVM will be enough.

yowl commented 2 years ago

just always returning false from isRegParamType under LLVM will be enough

Looks interesting:

------------ BB01 [000..042) (return), preds={} succs={}
               [000323] ------------                 IL_OFFSET void   IL offset: 0x0
N001 (  1,  1) [000007] ------------         t7 =    CNS_INT   int    0 $41
                                                  /--*  t7     int
N003 (  5,  4) [000008] DA----------              *  STORE_LCL_VAR struct<System.DateTimeOffset, 16> V00 tmp0         d:1
N001 ( 14,  2) [000002] H-CXG-------         t2 =    CALL help byref  HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE Zero Fseq[Zero, _ticks] $200
                                                  /--*  t2     byref
N003 ( 14,  3) [000312] DA-XG-------              *  STORE_LCL_VAR byref  V33 cse0         d:1
N004 (  1,  1) [000313] ------------       t313 =    LCL_VAR   byref  V33 cse0         u:1 Zero Fseq[Zero, _ticks] $200
                                                  /--*  t313   byref
N006 ( 18,  6) [000261] ---XG-------       t261 = *  IND       long   <l:$284, c:$283>
                                                  /--*  t261   long
N008 ( 18,  6) [000262] DA-XG-------              *  STORE_LCL_VAR long   V21 tmp21        d:1
N001 (  1,  1) [000058] -------N----        t58 =    LCL_VAR   long   V21 tmp21        u:1 <l:$280, c:$2c0>
                                                  /--*  t58    long
N002 (  1,  1) [000266] -c----------       t266 = *  FIELD_LIST struct $c1
                                                  /--*  t266   struct arg0
N003 ( 18,  4) [000059] --CXG-------        t59 = *  CALL      int    System.DateTimeOffset.ValidateOffset $300

This case is easier perhaps as ValidateOffset is

define i16 @S_P_CoreLib_System_DateTimeOffset__ValidateOffset(i8* %0, i64 %1)

i.e, there is no struct type, on the other hand if there was a call to this using a FIELD_LIST

define i8 @S_P_CoreLib_System_TimeZoneInfo__GetIsInvalidTime(i8* %0, i64 %1, %"[S.P.CoreLib]System.Globalization.DaylightTimeStruct" %2)

As you say we may have to alloca the type from the CALL, load it, then insertvalue. Doesn't seem super efficient, but should be doable as a starting point.

yowl commented 2 years ago

I've had a stab at this over at https://github.com/yowl/runtimelab/tree/decomp-struct. I have no PR yet. For System.ValueType:GetHashCode I'm hitting an assert before getting to the LLVM compilation.

https://github.com/yowl/runtimelab/blob/a23147bac6a7ee7512dd2545be6b010067cc18ca/src/coreclr/jit/importer.cpp#L17464

The dump is

E:\GitHub\runtimelab\src\tests\nativeaot\SmokeTests\HelloWasm\HelloWasm.cs(1477): Trim analysis warning IL2070: Program.NewMethod(Type,ClassForMetaTests): 'this' argument does not satisfy 'DynamicallyAccessedMemberTypes.PublicMethods' in call to 'System.Type.GetMethod(String)'. The parameter '#0' of method 'Program.NewMethod(Type,ClassForMetaTests)' does not have matching annotations. The source value must declare at least the same requirements as those declared on the target location it is assigned to.
E:\GitHub\runtimelab\src\tests\nativeaot\SmokeTests\HelloWasm\HelloWasm.cs(1483): Trim analysis warning IL2070: Program.NewMethod(Type,ClassForMetaTests): 'this' argument does not satisfy 'DynamicallyAccessedMemberTypes.PublicMethods' in call to 'System.Type.GetMethod(String)'. The parameter '#0' of method 'Program.NewMethod(Type,ClassForMetaTests)' does not have matching annotations. The source value must declare at least the same requirements as those declared on the target location it is assigned to.
E:\GitHub\runtimelab\src\libraries\System.Private.CoreLib\src\System\Resources\ManifestBasedResourceGroveler.cs(238): Trim analysis warning IL2026: System.Resources.ManifestBasedResourceGroveler.CreateResourceSet(Stream,Assembly): Using method 'System.Resources.ManifestBasedResourceGroveler.InternalGetResourceSetFromSerializedData(Stream,String,String,ResourceManagerMediator)' which has 'RequiresUnreferencedCodeAttribute' can break functionality when trimming application code. The CustomResourceTypesSupport feature switch has been enabled for this app which is being trimmed. Custom readers as well as custom objects on the resources file are not observable by the trimmer and so required assemblies, types and members may be removed.
****** START compiling System.ValueType:GetHashCode():int:this (MethodHash=fd70cd9a)
Generating code for Windows wasm
OPTIONS: compCodeOpt = SMALL_CODE
OPTIONS: compDbgCode = false
OPTIONS: compDbgInfo = false
OPTIONS: compDbgEnC  = false
OPTIONS: compProcedureSplitting   = false
OPTIONS: compProcedureSplittingEH = false
OPTIONS: Jit invoked for ngen
IL to import:
IL_0000  02                ldarg.0
IL_0001  28 62 02 00 06    call         0x6000262
IL_0006  0b                stloc.1
IL_0007  12 01             ldloca.s     0x1
IL_0009  fe 16 36 00 00 02 constrained. 0x2000036
IL_000f  6f 6c 02 00 06    callvirt     0x600026C
IL_0014  0a                stloc.0
IL_0015  06                ldloc.0
IL_0016  02                ldarg.0
IL_0017  28 bd 05 00 06    call         0x60005BD
IL_001c  61                xor
IL_001d  0a                stloc.0
IL_001e  06                ldloc.0
IL_001f  2a                ret

lvaSetClass: setting class for V00 to (4000000000420028) System.ValueType
'this'    passed in register rcx
*************** In compInitDebuggingInfo() for System.ValueType:GetHashCode():int:this
*************** In fgFindBasicBlocks() for System.ValueType:GetHashCode():int:this
weight= 10 : state   3 [ ldarg.0 ]
weight= 79 : state  40 [ call ]
weight= 34 : state  12 [ stloc.1 ]
weight= 61 : state  19 [ ldloca.s ]
weight=161 : state 190 [ constrained -> callvirt ]
weight= 20 : state 199 [ stloc.0 -> ldloc.0 ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 79 : state  40 [ call ]
weight= 35 : state  85 [ xor ]
weight= 20 : state 199 [ stloc.0 -> ldloc.0 ]
weight= 19 : state  42 [ ret ]
Jump targets:
  none
New Basic Block BB01 [0000] created.
BB01 [000..020)

Inline candidate callsite is hot.  Multiplier increased to 3.
Callsite has profile data: 1.  Multiplier limited to 12.6.
calleeNativeSizeEstimate=528
callsiteNativeSizeEstimate=85
benefit multiplier=12.6
threshold=1071
Native estimate for function size is within threshold for inlining 52.8 <= 107.1 (multiplier = 12.6)
IL Code Size,Instr   32,  13, Basic Block count   1, Local Variable Num,Ref count   3,  8 for method System.ValueType:GetHashCode():int:this
OPTIONS: opts.MinOpts() == false
Basic block list for 'System.ValueType:GetHashCode():int:this'

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..020)        (return)
-----------------------------------------------------------------------------------------------------------------------------------------

*************** Starting PHASE Pre-import

*************** Finishing PHASE Pre-import

*************** Starting PHASE Profile incorporation
BBOPT not set

*************** Finishing PHASE Profile incorporation [no changes]

*************** Starting PHASE Importation
*************** In impImport() for System.ValueType:GetHashCode():int:this

impImportBlockPending for BB01

Importing BB01 (PC=000) of 'System.ValueType:GetHashCode():int:this'
    [ 0]   0 (0x000) ldarg.0
    [ 1]   1 (0x001) call 06000262
In Compiler::impImportCall: opcode is call, kind=0, callRetType is struct, structSize is 4

               [000001] I-C-G-------              *  CALL      struct System.Object.get_EETypePtr (exactContextHnd=0x4000000000420041)
               [000000] ------------ this in rcx  \--*  LCL_VAR   ref    V00 this

    [ 1]   6 (0x006) stloc.1

               [000005] -AC---------              *  ASG       struct (copy)
               [000003] D------N----              +--*  LCL_VAR   struct<System.EETypePtr, 4> V02 loc1
               [000002] --C---------              \--*  RET_EXPR  struct(inl return expr [000001])

    [ 0]   7 (0x007) ldloca.s 1
    [ 1]   9 (0x009) constrained. (02000036) callvirt 0600026C
In Compiler::impImportCall: opcode is callvirt, kind=0, callRetType is int, structSize is 0

               [000008] I-C-G-------              *  CALL      int    System.EETypePtr.GetHashCode (exactContextHnd=0x4000000000420031)
               [000007] ------------ this in rcx  \--*  ADDR      byref
               [000006] -------N----                 \--*  LCL_VAR   struct<System.EETypePtr, 4> V02 loc1

    [ 1]  20 (0x014) stloc.0

               [000011] -AC---------              *  ASG       int
               [000010] D------N----              +--*  LCL_VAR   int    V01 loc0
               [000009] --C---------              \--*  RET_EXPR  int   (inl return expr [000008])

    [ 0]  21 (0x015) ldloc.0
    [ 1]  22 (0x016) ldarg.0
    [ 2]  23 (0x017) call 060005BD
In Compiler::impImportCall: opcode is call, kind=0, callRetType is int, structSize is 0

               [000014] I-C-G-------              *  CALL      int    System.ValueType.GetHashCodeImpl (exactContextHnd=0x4000000000420029)
               [000013] ------------ this in rcx  \--*  LCL_VAR   ref    V00 this

    [ 2]  28 (0x01c) xor
    [ 1]  29 (0x01d) stloc.0

               [000018] -AC---------              *  ASG       int
               [000017] D------N----              +--*  LCL_VAR   int    V01 loc0
               [000016] --C---------              \--*  XOR       int
               [000012] ------------                 +--*  LCL_VAR   int    V01 loc0
               [000015] --C---------                 \--*  RET_EXPR  int   (inl return expr [000014])

    [ 0]  30 (0x01e) ldloc.0
    [ 1]  31 (0x01f) ret

               [000020] ------------              *  RETURN    int
               [000019] ------------              \--*  LCL_VAR   int    V01 loc0

*************** Finishing PHASE Importation
Trees after Importation

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..020)        (return)                     i
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..020) (return), preds={} succs={}

***** BB01
               [000001] I-C-G-------              *  CALL      struct System.Object.get_EETypePtr (exactContextHnd=0x4000000000420041)
               [000000] ------------ this in rcx  \--*  LCL_VAR   ref    V00 this

***** BB01
               [000005] -AC---------              *  ASG       struct (copy)
               [000003] D------N----              +--*  LCL_VAR   struct<System.EETypePtr, 4> V02 loc1
               [000002] --C---------              \--*  RET_EXPR  struct(inl return expr [000001])

***** BB01
               [000008] I-C-G-------              *  CALL      int    System.EETypePtr.GetHashCode (exactContextHnd=0x4000000000420031)
               [000007] ------------ this in rcx  \--*  ADDR      byref
               [000006] -------N----                 \--*  LCL_VAR   struct<System.EETypePtr, 4> V02 loc1

***** BB01
               [000011] -AC---------              *  ASG       int
               [000010] D------N----              +--*  LCL_VAR   int    V01 loc0
               [000009] --C---------              \--*  RET_EXPR  int   (inl return expr [000008])

***** BB01
               [000014] I-C-G-------              *  CALL      int    System.ValueType.GetHashCodeImpl (exactContextHnd=0x4000000000420029)
               [000013] ------------ this in rcx  \--*  LCL_VAR   ref    V00 this

***** BB01
               [000018] -AC---------              *  ASG       int
               [000017] D------N----              +--*  LCL_VAR   int    V01 loc0
               [000016] --C---------              \--*  XOR       int
               [000012] ------------                 +--*  LCL_VAR   int    V01 loc0
               [000015] --C---------                 \--*  RET_EXPR  int   (inl return expr [000014])

***** BB01
               [000020] ------------              *  RETURN    int
               [000019] ------------              \--*  LCL_VAR   int    V01 loc0

-------------------------------------------------------------------------------------------------------------------

*************** Starting PHASE Indirect call transform

 -- no candidates to transform

*************** Finishing PHASE Indirect call transform [no changes]

*************** Starting PHASE Expand patchpoints

 -- no patchpoints to transform

*************** Finishing PHASE Expand patchpoints [no changes]

*************** Starting PHASE Post-import

*************** Finishing PHASE Post-import

*************** Starting PHASE Morph - Init

New BlockSet epoch 1, # of blocks (including unused BB00): 2, bitset array size: 1 (short)

*************** In fgRemoveEmptyBlocks

*************** Finishing PHASE Morph - Init
*************** In fgDebugCheckBBlist

*************** Starting PHASE Morph - Inlining
Expanding INLINE_CANDIDATE in statement STMT00000 in BB01:
               [000001] I-C-G-------              *  CALL      struct System.Object.get_EETypePtr (exactContextHnd=0x4000000000420041)
               [000000] ------------ this in rcx  \--*  LCL_VAR   ref    V00 this
thisArg: is a local var
               [000000] ------------              *  LCL_VAR   ref    V00 this

INLINER: inlineInfo.tokenLookupContextHandle for System.Object:get_EETypePtr():System.EETypePtr:this set to 0x4000000000420041:

Invoking compiler for the inlinee method System.Object:get_EETypePtr():System.EETypePtr:this :
IL to import:
IL_0000  02                ldarg.0
IL_0001  7b ac 00 00 04    ldfld        0x40000AC
IL_0006  73 8c 04 00 06    newobj       0x600048C
IL_000b  2a                ret

INLINER impTokenLookupContextHandle for System.Object:get_EETypePtr():System.EETypePtr:this is 0x4000000000420041.
*************** In fgFindBasicBlocks() for System.Object:get_EETypePtr():System.EETypePtr:this
Jump targets:
  none
New Basic Block BB02 [0001] created.
BB02 [000..00C)
Basic block list for 'System.Object:get_EETypePtr():System.EETypePtr:this'

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB02 [0001]  1                             1       [000..00C)        (return)
-----------------------------------------------------------------------------------------------------------------------------------------

*************** Inline @[000001] Starting PHASE Pre-import

*************** Inline @[000001] Finishing PHASE Pre-import

*************** Inline @[000001] Starting PHASE Profile incorporation
BBOPT not set
Computing inlinee profile scale:
   ... no callee profile data, will use non-pgo weight to scale
   ... call site not profiled, will use non-pgo weight to scale
   call site count 100 callee entry count 100 scale 1
Scaling inlinee blocks

*************** Inline @[000001] Finishing PHASE Profile incorporation
Trees after Profile incorporation

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB02 [0001]  1                             1       [000..00C)        (return)
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB02 [000..00C) (return), preds={} succs={}

-------------------------------------------------------------------------------------------------------------------

*************** Inline @[000001] Starting PHASE Importation
*************** In impImport() for System.Object:get_EETypePtr():System.EETypePtr:this

impImportBlockPending for BB02

Importing BB02 (PC=000) of 'System.Object:get_EETypePtr():System.EETypePtr:this'
    [ 0]   0 (0x000) ldarg.0
    [ 1]   1 (0x001) ldfld 040000AC
    [ 1]   6 (0x006) newobj
lvaGrabTemp returning 3 (V03 tmp0) called for NewObj constructor temp.

               [000024] IA----------              *  ASG       struct (init)
               [000022] D------N----              +--*  LCL_VAR   struct<System.EETypePtr, 4> V03 tmp0
               [000023] ------------              \--*  CNS_INT   int    0
 0600048C
In Compiler::impImportCall: opcode is newobj, kind=0, callRetType is void, structSize is 0

               [000027] I-CXG-------              *  CALL      void   System.EETypePtr..ctor (exactContextHnd=0x4000000000420031)
               [000026] ------------ this in rcx  +--*  ADDR      byref
               [000025] -------N----              |  \--*  LCL_VAR   struct<System.EETypePtr, 4> V03 tmp0
               [000021] ---XG------- arg1         \--*  FIELD     int    m_pEEType
               [000000] ------------                 \--*  LCL_VAR   ref    V00 this

    [ 1]  11 (0x00b) ret

    Inlinee Return expression (before normalization)  =>
               [000028] ------------              *  LCL_VAR   struct<System.EETypePtr, 4> V03 tmp0
E:\GitHub\runtimelab\src\coreclr\jit\importer.cpp:17464
Assertion failed 'iciCall->HasRetBufArg()' in 'System.ValueType:GetHashCode():int:this' during 'Morph - Inlining' (IL size 32)
yowl commented 2 years ago

I tried enabling the if above that is for X86

https://github.com/yowl/runtimelab/blob/a23147bac6a7ee7512dd2545be6b010067cc18ca/src/coreclr/jit/importer.cpp#L17438

but that doesn't make much difference as retTypeDesc.GetReturnRegCount() returns 0

I'm not really sure what our desired path is here. In Debug(although that can't matter I suppose), this method ends up as

define i32 @S_P_CoreLib_System_ValueType__GetHashCode(i8* %0)
SingleAccretion commented 2 years ago

but that doesn't make much difference as retTypeDesc.GetReturnRegCount() returns 0

Yeah... It's the unfortunate problem that the LLVM struct returns cannot be modeled by the compiler.

So it'd probably come down to adding another #ifdef TARGET_WASM there with the duplicated code:

assert(!iciCall->HasRetBufArg());

if (fgNeedReturnSpillTemp())
{
    if (!impInlineInfo->retExpr)
    {
        // The inlinee compiler has figured out the type of the temp already. Use it here.
        impInlineInfo->retExpr =
            gtNewLclvNode(lvaInlineeReturnSpillTemp, lvaTable[lvaInlineeReturnSpillTemp].lvType);
    }
}
else
{
    impInlineInfo->retExpr = op2;
}

(Or #ifdef-ing away the return reg count check for WASM in one of existing clauses)

yowl commented 2 years ago

Thanks, that looks pretty tidy now:

define i32 @S_P_CoreLib_System_ValueType__GetHashCode(i8* %0) {
Prolog:
  br label %1

1:                                                ; preds = %Prolog
  %2 = getelementptr i8, i8* %0, i32 0
  %3 = bitcast i8* %2 to i8**
  %4 = load i8*, i8** %3, align 4
  %5 = bitcast i8* %4 to i32*
  %6 = load i32, i32* %5, align 4
  %7 = getelementptr i8, i8* %0, i32 4
  %8 = inttoptr i32 %6 to i8*
  %9 = bitcast i8* %7 to i8**
  store i8* %8, i8** %9, align 4
  %10 = getelementptr i8, i8* %0, i32 4
  %11 = bitcast i8* %10 to i8**
  %12 = load i8*, i8** %11, align 4
  %13 = getelementptr i8, i8* %12, i32 16
  %14 = bitcast i8* %13 to i32*
  %15 = load i32, i32* %14, align 4
  %16 = getelementptr i8, i8* %0, i32 0
  %17 = bitcast i8* %16 to i8**
  %18 = load i8*, i8** %17, align 4
  %19 = getelementptr i8, i8* %0, i32 8
  %20 = getelementptr i8, i8* %0, i32 8
  %21 = bitcast i8* %20 to i8**
  store i8* %18, i8** %21, align 4
  %22 = call i32 @S_P_CoreLib_System_ValueType__GetHashCodeImpl(i8* %19)
  %xor = xor i32 %22, %15
  ret i32 %xor
}
yowl commented 2 years ago

I want to make a note of some functions that might be worth a sanity check once the compilation is working to I'll do that here and fill out the LLVM when its done.

S_P_CoreLib_System_Collections_Generic_GenericEqualityComparer_1<S_P_CoreLib_System_DateTime>__Equals

define i8 @"S_P_CoreLib_System_Collections_Generic_GenericEqualityComparer_1<S_P_CoreLib_System_DateTime>__Equals"(i8* %0, %"[S.P.CoreLib]System.DateTime" %1, %"[S.P.CoreLib]System.DateTime" %2) {
Prolog:
  %3 = alloca %"[S.P.CoreLib]System.DateTime", align 8
  store %"[S.P.CoreLib]System.DateTime" %1, %"[S.P.CoreLib]System.DateTime"* %3, align 1
  %4 = alloca %"[S.P.CoreLib]System.DateTime", align 8
  store %"[S.P.CoreLib]System.DateTime" %2, %"[S.P.CoreLib]System.DateTime"* %4, align 1
  br label %5

5:                                                ; preds = %Prolog
  %6 = bitcast %"[S.P.CoreLib]System.DateTime"* %4 to i8*
  %7 = getelementptr i8, i8* %6, i16 0
  %8 = bitcast i8* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %10 = bitcast %"[S.P.CoreLib]System.DateTime"* %3 to i8*
  %11 = getelementptr i8, i8* %10, i16 0
  %12 = bitcast i8* %11 to i64*
  %13 = load i64, i64* %12, align 8
  %and = and i64 %13, 4611686018427387903
  %and1 = and i64 %9, 4611686018427387903
  %14 = icmp eq i64 %and, %and1
  %15 = zext i1 %14 to i8
  ret i8 %15

Looks ok visually at least.