dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.17k stars 4.72k forks source link

RyuJIT_x64 doesn't do as much optimization for an throw method as directly throw does? #9962

Open yyjdelete opened 6 years ago

yyjdelete commented 6 years ago

Tested with netcore1.1/2.0/2.1(2.1.300-preview2-008347)/net471 x64, seems not happened with x86build.

Test Code https://github.com/yyjdelete/JIT_Test

I'm not sure if it's an issue since I'm not familiar with JIT, but has some questions.

  1. Why all ThrowWithCall(Error.ThrowExpection(string.format(...));) has rep stos but not for ThrowDirect(throw Error.CreateExpection(string.format(...));), the latter one is about 50% faster with all cases in the test code.
  2. Why when call directly into CheckIndex0ThrowWithCall, the Branch Prediction didn't work well by move throw to the end but it's well for all others(RunThrowWithCall, which is just a call to CheckIndex0ThrowWithCall);
  3. What's the different between CheckIndexThrowDirect and CheckIndexThrowDirect2, why jmp Test in the first case and call Test for the second.

category:cq theme:inlining skill-level:expert cost:small

AndyAyersMS commented 6 years ago

If you have some specific end goal in mind -- say you are looking for the most performant way to structure exception checking in your code -- it might be better to ask that instead.

But I'm also happy to answer your more detailed questions. It may take a while to get to them all.

As far as point 1 goes: there is a key difference in inlining. This can be seen from the inline tree which shows the inlines in the order that they actually happened in the code (that is, the inliner works in "lexical" order and somewhat "depth first"). This tree is available when running with checked jits via COMPlus_JitPrintInlinedMethods=1. You may also be able to reconstruct it from the jit inline ETL event stream.

In the ThrowWithCall case, the jit does not see that the string formatting is done in the same block as a method that must throw until after it has already inlined part of the string formatting. And that inline has created a GC struct local that requires zeroing in the prolog. Hence the extra REP STOS. There are inline heuristics that make the jit reluctant to inline a method with a GC struct into a known cold path, but here the jit doesn't realize the path is cold until it has cracked open the ThrowIndexOutOfRangeException method when trying to inline it.

Inlines into 06000004 Dll.TestJIT:RunThrowWithCall(int,int):this
  [1 IL=0003 TR=000004 0600000C] [aggressive inline attribute] Dll.TestJIT:CheckIndex0ThrowWithCall(int,int):this
    [2 IL=0008 TR=000012 0600000E] [aggressive inline attribute] Dll.TestJIT:IsOutOfBounds(int,int,int):bool
    [0 IL=0003 TR=000011 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
    [0 IL=0033 TR=000054 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
    [3 IL=0043 TR=000068 06000311] [below ALWAYS_INLINE size] System.String:Format(ref,ref,ref,ref):ref
      [4 IL=0005 TR=000109 06001627] [profitable inline] System.ParamsArray:.ctor(ref,ref,ref):this
      [0 IL=0010 TR=000115 06000317] [FAILED: has ldstr VM restriction] System.String:FormatHelper(ref,ref,struct):ref
    [0 IL=0048 TR=000075 06000011] [FAILED: does not return] Dll.ThrowHelper:ThrowIndexOutOfRangeException(ref)

In the second case because the throw is visible in ThrowDirect the jit realizes early that the path is cold and so does not inline the formatting method. And here you can see the heuristic I mentioned above kicking in to block those inlines:

Inlines into 06000005 Dll.TestJIT:RunThrowDirect(int,int):this
  [1 IL=0003 TR=000004 0600000D] [aggressive inline attribute] Dll.TestJIT:CheckIndex0ThrowDirect(int,int):this
    [2 IL=0008 TR=000012 0600000E] [aggressive inline attribute] Dll.TestJIT:IsOutOfBounds(int,int,int):bool
    [0 IL=0003 TR=000011 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
    [0 IL=0033 TR=000054 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
    [0 IL=0043 TR=000068 06000311] [FAILED: rarely called, has gc struct] System.String:Format(ref,ref,ref,ref):ref
    [3 IL=0048 TR=000075 06000010] [below ALWAYS_INLINE size] Dll.ThrowHelper:CreateIndexOutOfRangeException(ref):ref
      [0 IL=0001 TR=000116 06001491] [FAILED: unprofitable inline] System.IndexOutOfRangeException:.ctor(ref):this
    [0 IL=0043 TR=000068 06000311] [FAILED: rarely called, has gc struct] System.String:Format(ref,ref,ref,ref):ref

Because String:Format is not inlined the prolog for RunThrowDirect can be simpler.

So this is a example where inlining into a cold path can have a serious performance impact on a non-cold path.

One might hope that the jit could somehow look ahead and see the block is a must-throw, but that's not something we're able to do right now. There is a fairly strong dependence in the jit for "in order" inlining.

We may be able to do better here in the future -- there are several technologies in development that should be applicable. The ILLinker may be able to do this sort of in-depth analysis and mark methods so the jit can see early on that a method must throw. Or tiered jitting may be able to incorporate profile feedback and see early that the block with the call is cold.

But for now, if performance is a key consideration, the small details matter.

yyjdelete commented 6 years ago

Thank you very much.

Try to make an Debug build with master, see the message below with COMPlus_JitDump=TestJIT::CheckIndex0ThrowWithCall and unable to see format with COMPlus_JitPrintInlinedMethods=1. Maybe it's already fixed in master.

INLINER: during 'impMarkInlineCandidate' result 'failed this callee' reason 'noinline per IL/cached result' for 'Dll.TestJIT:CheckIndex0ThrowWithCall(int,int):this' calling 'System.String:Format(ref,ref,ref,ref):ref'

The different between CheckIndex0ThrowWithCall and RunThrowWithCall. (This also happen without string.Format, see X1 and X2 in new test). For the first one , ThrowIndexOutOfRangeException is be consider more as an Implicit Tail call than an throw(rare), and do no reorder in this case, which lead to an bad branch prediction. @AndyAyersMS It this work as excepted?

CheckIndex0ThrowWithCall

Renumbering the basic blocks for fgComputeReachability pass dotnet/coreclr#1

*************** Before renumbering the basic blocks

--------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    [IL range]     [jump]      [EH region]         [flags]
--------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1     [000..00F)-> BB03 ( cond )                     i label target gcsafe 
BB02 [0001]  1       BB01                  1     [00F..035)        (return)                     i jmp gcsafe newobj 
BB03 [0002]  1       BB01                  1     [035..036)        (return)                     i label target 
--------------------------------------------------------------------------------------------------------------------------------------

RunThrowWithCall

*************** Before renumbering the basic blocks

--------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    [IL range]     [jump]      [EH region]         [flags]
--------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1     [000..009)-> BB03 ( cond )                     i label target gcsafe 
BB04 [0003]  1       BB01                  1     [000..001)        (return)                     i label target 
BB03 [0002]  1       BB01                  0     [000..001)        (throw )                     i rare label target gcsafe newobj 
--------------------------------------------------------------------------------------------------------------------------------------
AndyAyersMS commented 6 years ago

You need to be careful when you look at the behavior of the jit (in particular the optimizations done by the jit) with debug builds. For instance the debug core library will be built with jit optimizations disabled and assertions enabled.

I would recommend using the Checked build for these sorts of investigations as has the same code generation as Release but also has the various internal jit dumps available.

It looks like when we find a "does not return" call in a block we probably should also mark the block as rarely executed and subsequently move it to the end of the method. Let me verify that -- if so we can open an issue for changing this specifically.

AndyAyersMS commented 4 years ago

Think we already do the right things here, just need to double-check.