Open yyjdelete opened 6 years ago
If you have some specific end goal in mind -- say you are looking for the most performant way to structure exception checking in your code -- it might be better to ask that instead.
But I'm also happy to answer your more detailed questions. It may take a while to get to them all.
As far as point 1 goes: there is a key difference in inlining. This can be seen from the inline tree which shows the inlines in the order that they actually happened in the code (that is, the inliner works in "lexical" order and somewhat "depth first"). This tree is available when running with checked jits via COMPlus_JitPrintInlinedMethods=1. You may also be able to reconstruct it from the jit inline ETL event stream.
In the ThrowWithCall
case, the jit does not see that the string formatting is done in the same block as a method that must throw until after it has already inlined part of the string formatting. And that inline has created a GC struct local that requires zeroing in the prolog. Hence the extra REP STOS. There are inline heuristics that make the jit reluctant to inline a method with a GC struct into a known cold path, but here the jit doesn't realize the path is cold until it has cracked open the ThrowIndexOutOfRangeException
method when trying to inline it.
Inlines into 06000004 Dll.TestJIT:RunThrowWithCall(int,int):this
[1 IL=0003 TR=000004 0600000C] [aggressive inline attribute] Dll.TestJIT:CheckIndex0ThrowWithCall(int,int):this
[2 IL=0008 TR=000012 0600000E] [aggressive inline attribute] Dll.TestJIT:IsOutOfBounds(int,int,int):bool
[0 IL=0003 TR=000011 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
[0 IL=0033 TR=000054 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
[3 IL=0043 TR=000068 06000311] [below ALWAYS_INLINE size] System.String:Format(ref,ref,ref,ref):ref
[4 IL=0005 TR=000109 06001627] [profitable inline] System.ParamsArray:.ctor(ref,ref,ref):this
[0 IL=0010 TR=000115 06000317] [FAILED: has ldstr VM restriction] System.String:FormatHelper(ref,ref,struct):ref
[0 IL=0048 TR=000075 06000011] [FAILED: does not return] Dll.ThrowHelper:ThrowIndexOutOfRangeException(ref)
In the second case because the throw is visible in ThrowDirect
the jit realizes early that the path is cold and so does not inline the formatting method. And here you can see the heuristic I mentioned above kicking in to block those inlines:
Inlines into 06000005 Dll.TestJIT:RunThrowDirect(int,int):this
[1 IL=0003 TR=000004 0600000D] [aggressive inline attribute] Dll.TestJIT:CheckIndex0ThrowDirect(int,int):this
[2 IL=0008 TR=000012 0600000E] [aggressive inline attribute] Dll.TestJIT:IsOutOfBounds(int,int,int):bool
[0 IL=0003 TR=000011 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
[0 IL=0033 TR=000054 06000001] [FAILED: target not direct] Dll.TestJIT:get_C():int:this
[0 IL=0043 TR=000068 06000311] [FAILED: rarely called, has gc struct] System.String:Format(ref,ref,ref,ref):ref
[3 IL=0048 TR=000075 06000010] [below ALWAYS_INLINE size] Dll.ThrowHelper:CreateIndexOutOfRangeException(ref):ref
[0 IL=0001 TR=000116 06001491] [FAILED: unprofitable inline] System.IndexOutOfRangeException:.ctor(ref):this
[0 IL=0043 TR=000068 06000311] [FAILED: rarely called, has gc struct] System.String:Format(ref,ref,ref,ref):ref
Because String:Format
is not inlined the prolog for RunThrowDirect
can be simpler.
So this is a example where inlining into a cold path can have a serious performance impact on a non-cold path.
One might hope that the jit could somehow look ahead and see the block is a must-throw, but that's not something we're able to do right now. There is a fairly strong dependence in the jit for "in order" inlining.
We may be able to do better here in the future -- there are several technologies in development that should be applicable. The ILLinker may be able to do this sort of in-depth analysis and mark methods so the jit can see early on that a method must throw. Or tiered jitting may be able to incorporate profile feedback and see early that the block with the call is cold.
But for now, if performance is a key consideration, the small details matter.
Thank you very much.
Try to make an Debug build with master
, see the message below with COMPlus_JitDump=TestJIT::CheckIndex0ThrowWithCall
and unable to see format
with COMPlus_JitPrintInlinedMethods=1
. Maybe it's already fixed in master.
INLINER: during 'impMarkInlineCandidate' result 'failed this callee' reason 'noinline per IL/cached result' for 'Dll.TestJIT:CheckIndex0ThrowWithCall(int,int):this' calling 'System.String:Format(ref,ref,ref,ref):ref'
The different between CheckIndex0ThrowWithCall
and RunThrowWithCall
. (This also happen without string.Format
, see X1
and X2
in new test).
For the first one , ThrowIndexOutOfRangeException
is be consider more as an Implicit Tail call
than an throw
(rare), and do no reorder in this case, which lead to an bad branch prediction. @AndyAyersMS It this work as excepted?
CheckIndex0ThrowWithCall
Renumbering the basic blocks for fgComputeReachability pass dotnet/coreclr#1
*************** Before renumbering the basic blocks
--------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight [IL range] [jump] [EH region] [flags]
--------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 [000..00F)-> BB03 ( cond ) i label target gcsafe
BB02 [0001] 1 BB01 1 [00F..035) (return) i jmp gcsafe newobj
BB03 [0002] 1 BB01 1 [035..036) (return) i label target
--------------------------------------------------------------------------------------------------------------------------------------
RunThrowWithCall
*************** Before renumbering the basic blocks
--------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight [IL range] [jump] [EH region] [flags]
--------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 [000..009)-> BB03 ( cond ) i label target gcsafe
BB04 [0003] 1 BB01 1 [000..001) (return) i label target
BB03 [0002] 1 BB01 0 [000..001) (throw ) i rare label target gcsafe newobj
--------------------------------------------------------------------------------------------------------------------------------------
You need to be careful when you look at the behavior of the jit (in particular the optimizations done by the jit) with debug builds. For instance the debug core library will be built with jit optimizations disabled and assertions enabled.
I would recommend using the Checked build for these sorts of investigations as has the same code generation as Release but also has the various internal jit dumps available.
It looks like when we find a "does not return" call in a block we probably should also mark the block as rarely executed and subsequently move it to the end of the method. Let me verify that -- if so we can open an issue for changing this specifically.
Think we already do the right things here, just need to double-check.
Tested with netcore1.1/2.0/2.1(2.1.300-preview2-008347)/net471 x64, seems not happened with x86build.
Test Code https://github.com/yyjdelete/JIT_Test
I'm not sure if it's an issue since I'm not familiar with JIT, but has some questions.
ThrowWithCall
(Error.ThrowExpection(string.format(...));
) hasrep stos
but not forThrowDirect
(throw Error.CreateExpection(string.format(...));
), the latter one is about 50% faster with all cases in the test code.CheckIndex0ThrowWithCall
, the Branch Prediction didn't work well by move throw to the end but it's well for all others(RunThrowWithCall
, which is just a call toCheckIndex0ThrowWithCall
);CheckIndexThrowDirect
andCheckIndexThrowDirect2
, why jmpTest
in the first case and callTest
for the second.category:cq theme:inlining skill-level:expert cost:small