dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
MIT License
15.48k stars 4.77k forks source link

JIT: devirtualization next steps #7541

Open AndyAyersMS opened 7 years ago

AndyAyersMS commented 7 years ago

dotnet/coreclr#9230 introduced some simple devirtualization. Some areas for future enhancements:

category:cq theme:devirtualization skill-level:expert cost:extra-large impact:large

AndyAyersMS commented 7 years ago

Some data from crossgen of System.Private.CoreLib, after dotnet/coreclr#10192. This shows the relative likelihood of devirtualization success or failure based on the operation feeding the call. The accounting is a bit tricky because this data includes callsites in inline candidates -- so from an IL standpoint some sites may be counted multiple times.

Overall around 8% of call sites are getting devirtualized. Virtual calls fare a bit better than interface calls. A number of operators never supply types.

It's not easy to know what a realistic upper bound is on how many sites can be devirtualized, but my feeling is that there is still quite a bit of headroom left and a number like 12% might not be out of the question.

Fail Success Total
VIRTUAL 7229 748 7977
box 45 45
call 907 9 916
cns_str 2 2
field 716 111 827
ind 1213 1213
index 124 124
intrinsic 225 225
lcl_var 3383 568 3951
ret_expr 616 58 674
INTERFACE 4086 220 4306
box 34 34
call 105 105
field 935 76 1011
ind 32 32
lcl_var 2955 140 3095
ret_expr 25 4 29
Total 11315 968 12283

Devirtualization can fail for several reasons; here's a crude accounting:

no-derived-method 12
no-type 392
obj-interface-type 3682
no-derived-method 29
obj-interface-type 120
no-type 2177
not-final-or-exact 4903
Total 11315

We should be able to reduce all of these numbers with improved tracking of types in the jit.

AndyAyersMS commented 7 years ago

Update after dotnet/coreclr#10371. There is a marked decrease in interface success because we cannot safely devirtualize interface calls just with a final (sealed) method -- we need a final or exact class. Explicit interface implementations in C# mark the corresponding methods as final so this was incorrectly firing a lot before the fix.

Success Fail Total Succ %
Virtual 749 7206 7955 9.42 %
Interface 96 4213 4309 2.23 %
Total 845 11419 12264 6.89 %

(edit: had virtual/interface labels swapped; now fixed)

AndyAyersMS commented 7 years ago

More detailed breakdown, showing reasons and operations (reasons lower case, operators upper case , each partitions the total above it).

CallKind OPCODE / reason Success Fail Total Succ %
Virtual 749 7206 7955 9.42 %
RET_EXPR 58 619 677 8.57 %
CALL 9 906 915 0.98 %
IND 0 1216 1216 0.00 %
LCL_VAR 569 3356 3925 14.50 %
INTRINSIC 0 225 225 0.00 %
FIELD 111 715 826 13.44 %
BOX 0 45 45 0.00 %
CNS_STR 2 0 2 100.00 %
INDEX 0 124 124 0.00 %
not-final-class-or-method-or-exact 0 4878 4878 0.00 %
no-type 0 2179 2179 0.00 %
final-class 705 0 705 100.00 %
obj-interface-type 0 120 120 0.00 %
no-derived-method 0 29 29 0.00 %
exact 22 0 22 100.00 %
final-method 22 0 22 100.00 %
Interface 96 4213 4309 2.23 %
RET_EXPR 0 29 29 0.00 %
CALL 0 109 109 0.00 %
IND 0 32 32 0.00 %
LCL_VAR 81 3012 3093 2.62 %
FIELD 15 997 1012 1.48 %
BOX 0 34 34 0.00 %
no-type 0 391 391 0.00 %
final-class 93 0 93 100.00 %
obj-interface-type 0 3682 3682 0.00 %
no-derived-method 0 12 12 0.00 %
not-final-class-or-exact 0 128 128 0.00 %
exact 3 0 3 100.00 %
Total 845 11419 12264 6.89 %
benaadams commented 7 years ago

A blend of Devirtualize interface calls/Devirtualize calls on struct types.

Currently for a generic you can do:

T value
((int)(object)value) == ((int)(object)value)

And it will undo the object box for structs; however that requires a specific concrete type

Would an aim be to also work an generic interface/constraint so if T was an int this could also work and undo the interface box?

T value
AndyAyersMS commented 7 years ago

Yes. We have some of the parts need for this, but not all. Today the jit produces

lvaGrabTemp returning 2 (V02 tmp0) called for Single-def Box Helper.
lvaSetClass: setting class for V02 to System.Int32  [exact]
impDevirtualizeCall: no type available (op=CALL)

    --*  CALLV stub int    System.IEquatable`1[Int32][System.Int32].Equals
      |  +--*  CNS_INT(h) long   0x7fff9d4433a0 class
      |  \--*  BOX       ref   
      |     \--*  LCL_VAR   ref    V02 tmp0         
      \--*  LCL_VAR   int    V01 arg1         

and devirt is blocked because the jit does not yet understand what type the interface cast returns.

So the first bit is to teach the jit what this helper does- - I've been holding off because generally figuring out when a cast will always succeed or will always fail is not that easy and there are lots of these cast and isinst helpers. But maybe I should push ahead get the easier cases out of the way. This will require new jit -> vm queries for casting which will return yes/no/maybe answers; initially there will be a lot of maybes but we can improve over time.

If we do that and get back a yes then the cast is a no-op and we know the type going into the call. So we can devirtualize the call and remove the cast. However, we will still be boxing and calling the boxed entry point for the devirtualized Equals.

We should be able to inline, and one might hope that at this point if we do inline, we could propagate the value being boxed through the box to the ultimate use, then realize the box is dead, and remove it. Some of this might actually happen, at least for primitive types in simple cases. I'd have to try and see.

We should also be able to see that the boxing done here is "harmless" because Equals won't modify its this argument (we know this is true for inherited value class methods), and ask the VM to give us the unboxed entry point. Then we can call this unboxed entry point directly, passing the address of the original value and undo the boxing. This would get rid of the box more directly, even if we didn't or couldn't inline.

I have work in progress branches for some of the missing parts, so am slowly chipping away at this.

AndyAyersMS commented 7 years ago

Hmm, we don't undo the box here after inlining, but we're not that far off from being able to:

using System;

class D
    static bool TestMe(object o)
        return ((int) o) == 33;

    public static int Main()
        return TestMe(33) ? 100 : 0;

Need importation of Unbox.Any to try and optimize away its type test.

pentp commented 7 years ago

Even if the cast will "maybe" succeed, the return type is still known if the cast throws (e.g., CORINFO_HELP_CHKCASTINTERFACE). This might not help with boxing/unboxing, but could help devirtualize if the resulting type is set as the intersection of the cast type and the type going into the cast.

In cases where it's unfeasible to prove that the unboxed value isn't modified, a value copy would work and still avoid boxing. If the cast result is always used as a unboxed value type, then even the helper call could work on a fake box (on stack).

AndyAyersMS commented 7 years ago

We usually don't learn much about an object's type from successful a interface cast other than the fact that that the type indeed implements the interface.

And yeah, making a "fake" box would be a reasonable way to handle possible side effecting calls.

benaadams commented 7 years ago

We usually don't learn much about an object's type from successful a interface cast other than the fact that that the type indeed implements the interface.

In a generic though, you also know what the actual type is? (if struct)

AndyAyersMS commented 7 years ago

Yes, but (if I'm following you correctly) we knew that before the cast -- the question is whether the cast tells us anything new if it succeeds.

A successful "maybe" cast to a non-interface type tells us something new, and if we're lucky, might allow us to devirtualize interface calls on code guarded by the cast.

A successful "maybe" cast to an interface type doesn't.

benaadams commented 7 years ago

Yes, but (if I'm following you correctly) we knew that before the cast

Expressing it in il, I'm suggesting that

bool DoThing2<T>(T val0, T val1)
    return ((IEquatable<T>)val0).Equals(val1);
  IL_0000:  ldarg.1
  IL_0001:  box        !!T
  IL_0006:  castclass  class [netstandard]System.IEquatable`1<!!T>
  IL_000b:  ldarg.2
  IL_000c:  callvirt   instance bool class [netstandard]System.IEquatable`1<!!T>::Equals(!0)
  IL_0011:  ret

When T is known at runtime for a non-shared generic (struct); and that it implements the interface, it could be turned into the same output as

bool DoThing1<T>(T val0, T val1) where T : IEquatable<T>
    return val0.Equals(val1);


  IL_0000:  ldarga.s   val0
  IL_0002:  ldarg.2
  IL_0003:  constrained. !!T
  IL_0009:  callvirt   instance bool class [netstandard]System.IEquatable`1<!!T>::Equals(!0)
  IL_000e:  ret

by dropping

  IL_0001:  box        !!T
  IL_0006:  castclass  class [netstandard]System.IEquatable`1<!!T>

and inserting

  IL_0003:  constrained. !!T

before the call

I'm not suggesting its that straight forward 😄 Is more to capture what I mean

benaadams commented 7 years ago

Though in actual use it would have to be guarded

bool DoThing2<T>(T val0, T val1)
    if (typeof(IEquatable<T>).IsAssignableFrom(typeof(T)))
        return ((IEquatable<T>)val0).Equals(val1);


  IL_0000:  ldtoken    class [netstandard]System.IEquatable`1<!!T>
  IL_0005:  call       class [netstandard]System.Type [netstandard]System.Type::GetTypeFromHandle(valuetype [netstandard]System.RuntimeTypeHandle)
  IL_000a:  ldtoken    !!T
  IL_000f:  call       class [netstandard]System.Type [netstandard]System.Type::GetTypeFromHandle(valuetype [netstandard]System.RuntimeTypeHandle)
  IL_0014:  callvirt   instance bool [netstandard]System.Type::IsAssignableFrom(class [netstandard]System.Type)
  IL_0019:  brfalse.s  IL_002d
  IL_001b:  ldarg.1
  IL_001c:  box        !!T
  IL_0021:  castclass  class [netstandard]System.IEquatable`1<!!T>
  IL_0026:  ldarg.2
  IL_0027:  callvirt   instance bool class [netstandard]System.IEquatable`1<!!T>::Equals(!0)
  IL_002c:  ret
  IL_002d: ; alternate path when not type

And changing all that, is probably a much bigger thing - so it might not be practically worth it :-/

AndyAyersMS commented 7 years ago

The first part is more or less where things are headed. If we know for sure T implements the interface and T is a value class then we might be able to determine which method will be called. So at least in some cases we should be able to devirtualize and unbox and inline and all that.

benaadams commented 6 years ago

Is it correct that generics don't currently devirtualize?

I can't seem to get ArrayPool.Shared to devirtualize; even if I pull the Byte and Char pool out to a non-generic class, and reference them directly

internal sealed class ArrayPools
    internal static TlsOverPerCoreLockedStacksArrayPool<byte> BytePool { get; } = new TlsOverPerCoreLockedStacksArrayPool<byte>();
    internal static TlsOverPerCoreLockedStacksArrayPool<char> CharPool { get; } = new TlsOverPerCoreLockedStacksArrayPool<char>();


I don't see

Devirtualized virtual call to TlsOverPerCoreLockedStacksArrayPool`1:Rent;
 now direct call to TlsOverPerCoreLockedStacksArrayPool`1:Rent [exact]

Which I was expecting (and definitely can't get it to work through ArrayPool<T>.Shared property when it inlines even if I rearrange the code to make sure the type is maintained and exposed:

Not a covered scenario yet? Or are there further (or less) changes I could make to the source to trigger it?

benaadams commented 6 years ago

Looking at the crossgened (rather than jitted) output so that might not be helping

AndyAyersMS commented 6 years ago

The explanation will be there in the jit dump. I recall seeing some array pool methods devirtualizing in the past.

I should probably surface the reasons why methods don't devirtualize... most don't so the level of chatter will go up quite a bit.

AndyAyersMS commented 6 years ago

This definition:

        public static ArrayPool<T> Shared { get; } =
            typeof(T) == typeof(byte) || typeof(T) == typeof(char) ? new TlsOverPerCoreLockedStacksArrayPool<T>() :

creates a hidden backing static field initialized by the class constructor. So when Shared gets inlined the jit only knows that the value has type ArrayPool<T> and not a more specific type. So no devirtualization happens:

impDevirtualizeCall: Trying to devirtualize virtual call:
    class for 'this' is System.Buffers.ArrayPool`1[Byte] (attrib 20000400)
    base method is System.Buffers.ArrayPool`1[Byte]::Rent
    devirt to System.Buffers.ArrayPool`1[Byte][System.Byte]::Rent -- speculative
               [000005] --C-G-------              *  CALLV ind ref    System.Buffers.ArrayPool`1[Byte][System.Byte].Rent
               [000022] ----G-------              |  /--*  FIELD     ref    <Shared>k__BackingField

If you call Create directly then devirtualization can happens for Rent -- eg:

        byte[] b = ArrayPool<byte>.Create().Rent(33);

;; Devirtualized virtual call to System.Buffers.ArrayPool`1[Byte]:Rent; now direct call to System.Buffers.ConfigurableArrayPool`1[Byte][System.Byte]:Rent [exact]
benaadams commented 6 years ago

So when Shared gets inlined the jit only knows that the value has type ArrayPool and not a more specific type.

Which is why I tried to push it out to strongly typed fields in (the most extreme variant I was trying)

public static ArrayPool<T> Shared
        if (typeof(T) == typeof(byte))
            return (ArrayPool<T>)(object)ArrayPools.BytePool;
        else if (typeof(T) == typeof(char))
            return (ArrayPool<T>)(object)ArrayPools.CharPool;
            return s_generalPool;

Also tried two fields which avoided the cast through (object), but wasn't sure it it was too much generic causing the issue

public static ArrayPool<T> Shared
        if (typeof(T) == typeof(byte) || (typeof(T) == typeof(char))
            return s_specificPool;
            return s_generalPool;
benaadams commented 6 years ago

Will explore with Jitdump; I think devirtualization will give much more benefit than for example using Lzcnt

AndyAyersMS commented 6 years ago

Looks like (with your dea46f4) we lose track of the type for the return value from get_Shared because we introduce a temp to hold the return value:

Invoking compiler for the inlinee method System.Buffers.ArrayPool`1[Byte][System.Byte]:get_Shared():ref :
lvaGrabTemp returning 1 (V01 tmp1) (a long lifetime temp) called for Inline return value spill temp.

Later when we've inlined get_Shared we try to devirtualize the call that it feeds but don't see any type information and bail:

**** Late devirt opportunity
               [000005] --C-G-------              *  CALLV ind ref    System.Buffers.ArrayPool`1[Byte][System.Byte].Rent
               [000040] ------------ this in rcx  +--*  LCL_VAR   ref    V01 tmp1         
               [000004] ------------ arg1         \--*  CNS_INT   int    33

impDevirtualizeCall: no type available (op=LCL_VAR)

Likely we need to update the code in fgFindBasicBlocks to provide appropriate hinting about the return value. Even that may not be sufficient as we may also need to update the type post-inline since it gets "sharpened" by inlining revealing the actual field access.

benaadams commented 6 years ago

Made PR to reduce my noise in this issue

AndyAyersMS commented 6 years ago

Here are some up-to-date status on devirtualization failures and successes, via PMI over the framework assembly set: image

The colored rows show the summaries for interface calls and virtual calls. The rows below break down the failures/successes by the opcode of the instruction that provided the object to the call. So for instance a 51 call sites BOX opcodes fed interface calls, and 47 of those were devirtualized.

The failure reasons are:

Numbers look a bit better than I remember and are better than the original measurements above -- 18% of virtual sites and 5% of interface sites are devirtualized.

Main cause for interface calls not devirtualizing -- around 86% of the time -- is that the best type the jit has for the object being dispatched is an interface type.

Main cause for virtual calls not devirtualizing -- around 60% of the time -- is that the jit doesn't know the exact class (and/or the class or method is not final). We can't easily tell from this data whether inexactness is a consequence of poor type propagation / refinement in the jit, or if the exact type cannot be statically determined.

The jit has some kind of type information at most virtual sites; only 3% lack types. It's still probably worth shoring up cases where types are missing, even though they're not that prevalent. Suspect the IND failures here are largely class statics that in principle should be easy to track.

AndyAyersMS commented 6 years ago

A couple of other bits of data: devirtualization works much better for call sites in inlinees than in the root method. Not too surprising.

Note devirtualization and inlining are a devirtualization often leads to an inline where we know the exact type of this or types of arguments, which leads to further devirtualization, which leads to further inlines... For root methods we're never going to be able (absent some magical interprocedural analysis) to know the actual types of parameters.


Also late devirtualization doesn't happen very often, and isn't very successful. Probably worth a deeper look.


Recall late devirtualization is done after an inline, where the jit propagates (better) information about the callee return type into the calling method. The low success rate here could have multiple causes -- for instance the jit doesn't do a thorough job of converting all shared types to unshared types, so if we inline a shared generic that returns T in a context where T is known exactly, we may fail to capitalize on this...

erozenfeld commented 6 years ago


adamsitnik commented 6 years ago

@AndyAyersMS are there any ETW events for devirtualization similar to the ones we have for inlining?

I am asking because I recently wrote an ETWProfiler for BenchmarkDotNet which allows to profile the benchmarks and calculate some metrics based on the captured ETW event. If there is any interesting info in the trace file related to devirtualization I could calculate new metrics and include them in BenchView for our benchmarks (/cc @jorive)

AndyAyersMS commented 6 years ago

No, there aren't any ETW events.

I probably should add something to the inline tree but haven't done that yet either.

AndyAyersMS commented 6 years ago

PR to added some info to the inline tree: dotnet/coreclr#20395

AndyAyersMS commented 6 years ago

Now merged, so the jit is tracking this information internally at least:

Inlines into 06000004 Y:Main():int
  [1 IL=0005 TR=000008 06000002] [below ALWAYS_INLINE size] X`1[__Canon][System.__Canon]:.ctor(ref):this
  [2 IL=0015 TR=000027 06000003] [profitable inline devirt unboxed] X`1[__Canon][System.__Canon]:Print():this
    [0 IL=0011 TR=000049 06000080] [FAILED: noinline per IL/cached result] System.Console:WriteLine(ref)
AndyAyersMS commented 6 years ago

Some further improvements over in dotnet/coreclr#20447.

AndyAyersMS commented 6 years ago

FIrst steps towards guarded devirtualization: dotnet/coreclr#21270

rikimaru0345 commented 5 years ago

@AndyAyersMS Hey I just wanted to stop by to say thank you! I'm writing a library for .net standard that really profits from all those optimizations ( ).

I really appreciate the hard work you're doing man! (you and everyone here actually) Thanks! :smile: :heart:

AndyAyersMS commented 5 years ago

@rikimaru0345 thanks for the kind words -- good to hear things are working well for you.

RagnaStrife commented 5 years ago

How does the CLR treat Default Interface Members when you use a reference to such interface AND the interface was implemented through a ValueType? Will it be devirtualized too?

interface I
    public int X() { return 0; }
struct V : I { }


I iface = new V(); // Note: it is a **ValueType**
return iface.X(); // Devirtualization here? If so, how?

Would that run the same (or nearly) as a struct-defined method? I'm just curious about these ValueType<->DefaultInterfaceMembers interactions here. Please, get technical.

Anyway, thanks for your hard work on this! I'm eager to see what .NET 5 will bring next year. 😄

am11 commented 3 years ago
  • [ ] Obtain more accurate return types from calls to shared generic methods whose return types are generic. This is typical of many collection types, eg List<T>.get_Item returns T. In many cases the caller will know the type of T exactly, but currently the jit ends up with the shared generic ref type placeholder type __Canon.

(assuming Lazy is not already intrinsified) Would this cover cases like System.Lazy<T>.Value, which incurs a null check on another private field _state: once the value is successfully initialized (_state == null), would JIT be able to swap it with devirtualized value for subsequent accesses?