dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.29k stars 4.74k forks source link

JIT throughput: noway_assert #7709

Open BruceForstall opened 7 years ago

BruceForstall commented 7 years ago

The JIT has about 3300 noway_assert. These are executed in non-DEBUG (aka, RELEASE) builds. Some might be frequently executed and thus costly. Instead of auditing all of them for relevance (i.e., in an optimization phase that can be backed out of), or apparently importance, we could change the noway_assert macro (conditionally) to collect a count of which ones are frequently executed, using a hash table from preprocessor FILE and LINE to execution count, dumped at the end of compilation. Then, we could convert the worst ones to simple asserts.

category:throughput theme:throughput skill-level:expert cost:medium

BruceForstall commented 7 years ago

I implemented this. I ran "a lot" of the test tree, processed the data, and saw these as the top 75 dynamic occurrences of noway_asserts (some might actually get optimized away in release builds, as there are a few static constant asserts here). The columns are: total count, filename, line number, assertion text.

181470679, e:\gh\coreclr2\src\jit\morph.cpp, 14648, tree
180436108, e:\gh\coreclr2\src\jit\morph.cpp, 14649, tree->gtOper != GT_STMT
83096677, e:\gh\coreclr2\src\jit\assertionprop.cpp, 808, assertIndex <= optAssertionCount
83096677, e:\gh\coreclr2\src\jit\assertionprop.cpp, 807, assertIndex != NO_ASSERTION_INDEX
72024047, e:\gh\coreclr2\src\jit\liveness.cpp, 1632, lclNum < lvaCount
43294958, e:\gh\coreclr2\src\jit\lsra.cpp, 3003, count < MaxInternalRegisters
40851620, e:\gh\coreclr2\src\jit\lclvars.cpp, 2510, varNum < lvaCount
35418942, e:\gh\coreclr2\src\jit\morph.cpp, 10632, tree->OperKind() & GTK_SMPOP
33982229, e:\gh\coreclr2\src\jit\flowgraph.cpp, 5228, opcode < CEE_COUNT
32541990, e:\gh\coreclr2\src\jit\morph.cpp, 11463, tree->gtOper != GT_CALL
32533197, e:\gh\coreclr2\src\jit\morph.cpp, 13347, oper == tree->gtOper
30050847, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16328, !(block->bbFlags & BBF_REMOVED)
27085069, e:\gh\coreclr2\src\jit\morph.cpp, 15542, stmt->gtOper == GT_STMT
23724178, e:\gh\coreclr2\src\jit\flowgraph.cpp, 18490, tree->gtOper == GT_STMT
23362852, e:\gh\coreclr2\src\jit\liveness.cpp, 1642, varIndex < lvaTrackedCount
23102961, e:\gh\coreclr2\src\jit\emit.h, 1604, (UNATIVE_OFFSET)distance == distance
22587177, e:\gh\coreclr2\src\jit\morph.cpp, 8401, tree->OperKind() & GTK_LEAF
21688207, e:\gh\coreclr2\src\jit\morph.cpp, 5984, tree->gtOper == GT_LCL_VAR
19590233, e:\gh\coreclr2\src\jit\lclvars.cpp, 3599, (tree->gtOper == GT_LCL_VAR) || (tree->gtOper == GT_LCL_FLD)
19478128, e:\gh\coreclr2\src\jit\lclvars.cpp, 3680, tiVerificationNeeded || varDsc->lvType == TYP_UNDEF || tree->gtType == TYP_UNKNOWN || allowStructs || genActualType(varDsc->TypeGet()) == genActualType(tree->gtType) || (tree->gtType == TYP_BYREF && varDsc->TypeGet() == TYP_I_IMPL) || (tree->gtType == TYP_I_IMPL && varDsc->TypeGet() == TYP_BYREF) || (tree->gtFlags & GTF_VAR_CAST) || varTypeIsFloating(varDsc->TypeGet()) && varTypeIsFloating(tree->gtType)
18224434, e:\gh\coreclr2\src\jit\morph.cpp, 6019, !(tree->gtFlags & GTF_VAR_DEF) || varAddr
15639847, e:\gh\coreclr2\src\jit\gentree.cpp, 7095, argInfo != nullptr
14993689, e:\gh\coreclr2\src\jit\morph.cpp, 8344, tree->OperKind() & GTK_CONST
13800099, e:\gh\coreclr2\src\jit\flowgraph.cpp, 18555, list.gtNext->gtPrev == &list
13209197, e:\gh\coreclr2\src\jit\morph.cpp, 14731, tree != nullptr
12710725, e:\gh\coreclr2\src\jit\emitxarch.cpp, 546, prefix >= 0x40 && prefix <= 0x4F
11966466, e:\gh\coreclr2\src\jit\morph.cpp, 10733, op1
11256849, e:\gh\coreclr2\src\jit\emitxarch.cpp, 1695, (int)offs < 0
10750793, e:\gh\coreclr2\src\jit\importer.cpp, 563, impTreeLast != nullptr
10503327, e:\gh\coreclr2\src\jit\flowgraph.cpp, 889, block
10444252, e:\gh\coreclr2\src\jit\codegenxarch.cpp, 1516, targetType != TYP_STRUCT
10283544, e:\gh\coreclr2\src\jit\flowgraph.cpp, 9626, block->bbNext == bNext
9808620, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16280, !(block->bbFlags & BBF_TRY_BEG)
9808617, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16279, !block->bbCatchTyp
9807880, e:\gh\coreclr2\src\jit\morph.cpp, 15677, fgPtrArgCntCur == 0
9490871, e:\gh\coreclr2\src\jit\valuenum.cpp, 747, attribs == CEA_None
9297809, e:\gh\coreclr2\src\jit\emitxarch.cpp, 3328, emitVerifyEncodable(ins, size, reg)
9049871, e:\gh\coreclr2\src\jit\liveness.cpp, 1879, VarSetOps::IsSubset(this, keepAliveVars, life)
9026196, e:\gh\coreclr2\src\jit\flowgraph.cpp, 890, blockPred
8640521, e:\gh\coreclr2\src\jit\emitxarch.cpp, 4861, emitVerifyEncodable(ins, size, ireg)
8540880, e:\gh\coreclr2\src\jit\codegencommon.cpp, 2423, rv1 || mul != 1
8540880, e:\gh\coreclr2\src\jit\codegencommon.cpp, 2425, FitsIn<INT32>(cns)
7375190, e:\gh\coreclr2\src\jit\morph.cpp, 15754, fgExpandInline == false
7352623, e:\gh\coreclr2\src\jit\emitxarch.cpp, 3735, emitVerifyEncodable(ins, size, reg1, reg2)
7322925, e:\gh\coreclr2\src\jit\morph.cpp, 10675, op1 == tree->gtOp.gtOp1
7282409, e:\gh\coreclr2\src\jit\morph.cpp, 8054, call->gtOper == GT_CALL
7028316, e:\gh\coreclr2\src\jit\lclvars.cpp, 3363, varDsc->lvRefCnt > 0
6854055, e:\gh\coreclr2\src\jit\flowgraph.cpp, 11017, (block->bbFlags & BBF_REMOVED) == 0
6777828, e:\gh\coreclr2\src\jit\inlinepolicy.cpp, 497, smOpcode < SM_COUNT
6777827, e:\gh\coreclr2\src\jit\inlinepolicy.cpp, 498, smOpcode != SM_PREFIX_N
6480615, e:\gh\coreclr2\src\jit\flowgraph.cpp, 17102, blk != nullptr
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13703, GT_SUB == GT_ADD + 1
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13704, GT_MUL == GT_ADD + 2
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13705, GT_DIV == GT_ADD + 3
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13706, GT_MOD == GT_ADD + 4
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13716, GT_RSZ == GT_ADD + 12
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13715, GT_RSH == GT_ADD + 11
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13714, GT_LSH == GT_ADD + 10
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13712, GT_AND == GT_ADD + 9
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13711, GT_XOR == GT_ADD + 8
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13707, GT_UDIV == GT_ADD + 5
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13708, GT_UMOD == GT_ADD + 6
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13710, GT_OR == GT_ADD + 7
6262994, e:\gh\coreclr2\src\jit\liveness.cpp, 3003, compCurBB == block
6062460, e:\gh\coreclr2\src\jit\flowgraph.cpp, 7343, tree->OperGet() == GT_ASG
6028687, e:\gh\coreclr2\src\jit\codegencommon.cpp, 11901, jitGetILoffs(offsx) <= compiler->info.compILCodeSize
5280889, e:\gh\coreclr2\src\jit\regalloc.cpp, 6747, varDsc->lvIsInReg() || varDsc->lvOnFrame || varDsc->lvRefCnt == 0
5280889, e:\gh\coreclr2\src\jit\regalloc.cpp, 6751, !varDsc->lvRegister || !varDsc->lvOnFrame
5280889, e:\gh\coreclr2\src\jit\lclvars.cpp, 4567, !varDsc->lvFramePointerBased || codeGen->doubleAlignOrFramePointerUsed()
5223565, e:\gh\coreclr2\src\jit\flowgraph.cpp, 1713, fgDomsComputed
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 1833, endNode || (startNode == compCurStmt->gtStmt.gtStmtExpr)
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 2945, nextStmt->gtOper == GT_STMT
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 2944, nextStmt
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 1832, compCurStmt->gtOper == GT_STMT
4798987, e:\gh\coreclr2\src\jit\assertionprop.cpp, 3931, !optLocalAssertionProp
BruceForstall commented 7 years ago

I measured the total impact of noway_assert using instruction counts over SuperPMI collections of the dotnet/coreclr testbed, by removing noway_assert from release build. I saw a 1.03% overhead from noway_assert using normal optimization, and a 0.74% overhead from noway_assert using MinOpts.

BruceForstall commented 7 years ago

The top 11 noway_asserts here are over 50% of the dynamic count, so converting these few to simple assert I would expect to see up to 0.5% throughput improvement.

mikedn commented 7 years ago

What's the story of this noway_assert thing anyway? How was decided where to use assert and where to use noway_assert?

BruceForstall commented 7 years ago

As I understand it, assert was converted to noway_assert automatically or semi-automatically. I actually don't know how they determined which were to be converted. Theoretically, we should not have any noway_assert unless a re-compilation with MinOpts would avoid repeating the condition (since that's the main benefit of noway_assert). But that's hard to tell sometimes. So we have to make a call about which ones are worth it, and which are too expensive (now, after the automated conversion was done).

TIHan commented 1 year ago

Considering this is for throughput, it would be interesting to investigate this more.