Open BruceForstall opened 7 years ago
I implemented this. I ran "a lot" of the test tree, processed the data, and saw these as the top 75 dynamic occurrences of noway_asserts (some might actually get optimized away in release builds, as there are a few static constant asserts here). The columns are: total count, filename, line number, assertion text.
181470679, e:\gh\coreclr2\src\jit\morph.cpp, 14648, tree
180436108, e:\gh\coreclr2\src\jit\morph.cpp, 14649, tree->gtOper != GT_STMT
83096677, e:\gh\coreclr2\src\jit\assertionprop.cpp, 808, assertIndex <= optAssertionCount
83096677, e:\gh\coreclr2\src\jit\assertionprop.cpp, 807, assertIndex != NO_ASSERTION_INDEX
72024047, e:\gh\coreclr2\src\jit\liveness.cpp, 1632, lclNum < lvaCount
43294958, e:\gh\coreclr2\src\jit\lsra.cpp, 3003, count < MaxInternalRegisters
40851620, e:\gh\coreclr2\src\jit\lclvars.cpp, 2510, varNum < lvaCount
35418942, e:\gh\coreclr2\src\jit\morph.cpp, 10632, tree->OperKind() & GTK_SMPOP
33982229, e:\gh\coreclr2\src\jit\flowgraph.cpp, 5228, opcode < CEE_COUNT
32541990, e:\gh\coreclr2\src\jit\morph.cpp, 11463, tree->gtOper != GT_CALL
32533197, e:\gh\coreclr2\src\jit\morph.cpp, 13347, oper == tree->gtOper
30050847, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16328, !(block->bbFlags & BBF_REMOVED)
27085069, e:\gh\coreclr2\src\jit\morph.cpp, 15542, stmt->gtOper == GT_STMT
23724178, e:\gh\coreclr2\src\jit\flowgraph.cpp, 18490, tree->gtOper == GT_STMT
23362852, e:\gh\coreclr2\src\jit\liveness.cpp, 1642, varIndex < lvaTrackedCount
23102961, e:\gh\coreclr2\src\jit\emit.h, 1604, (UNATIVE_OFFSET)distance == distance
22587177, e:\gh\coreclr2\src\jit\morph.cpp, 8401, tree->OperKind() & GTK_LEAF
21688207, e:\gh\coreclr2\src\jit\morph.cpp, 5984, tree->gtOper == GT_LCL_VAR
19590233, e:\gh\coreclr2\src\jit\lclvars.cpp, 3599, (tree->gtOper == GT_LCL_VAR) || (tree->gtOper == GT_LCL_FLD)
19478128, e:\gh\coreclr2\src\jit\lclvars.cpp, 3680, tiVerificationNeeded || varDsc->lvType == TYP_UNDEF || tree->gtType == TYP_UNKNOWN || allowStructs || genActualType(varDsc->TypeGet()) == genActualType(tree->gtType) || (tree->gtType == TYP_BYREF && varDsc->TypeGet() == TYP_I_IMPL) || (tree->gtType == TYP_I_IMPL && varDsc->TypeGet() == TYP_BYREF) || (tree->gtFlags & GTF_VAR_CAST) || varTypeIsFloating(varDsc->TypeGet()) && varTypeIsFloating(tree->gtType)
18224434, e:\gh\coreclr2\src\jit\morph.cpp, 6019, !(tree->gtFlags & GTF_VAR_DEF) || varAddr
15639847, e:\gh\coreclr2\src\jit\gentree.cpp, 7095, argInfo != nullptr
14993689, e:\gh\coreclr2\src\jit\morph.cpp, 8344, tree->OperKind() & GTK_CONST
13800099, e:\gh\coreclr2\src\jit\flowgraph.cpp, 18555, list.gtNext->gtPrev == &list
13209197, e:\gh\coreclr2\src\jit\morph.cpp, 14731, tree != nullptr
12710725, e:\gh\coreclr2\src\jit\emitxarch.cpp, 546, prefix >= 0x40 && prefix <= 0x4F
11966466, e:\gh\coreclr2\src\jit\morph.cpp, 10733, op1
11256849, e:\gh\coreclr2\src\jit\emitxarch.cpp, 1695, (int)offs < 0
10750793, e:\gh\coreclr2\src\jit\importer.cpp, 563, impTreeLast != nullptr
10503327, e:\gh\coreclr2\src\jit\flowgraph.cpp, 889, block
10444252, e:\gh\coreclr2\src\jit\codegenxarch.cpp, 1516, targetType != TYP_STRUCT
10283544, e:\gh\coreclr2\src\jit\flowgraph.cpp, 9626, block->bbNext == bNext
9808620, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16280, !(block->bbFlags & BBF_TRY_BEG)
9808617, e:\gh\coreclr2\src\jit\flowgraph.cpp, 16279, !block->bbCatchTyp
9807880, e:\gh\coreclr2\src\jit\morph.cpp, 15677, fgPtrArgCntCur == 0
9490871, e:\gh\coreclr2\src\jit\valuenum.cpp, 747, attribs == CEA_None
9297809, e:\gh\coreclr2\src\jit\emitxarch.cpp, 3328, emitVerifyEncodable(ins, size, reg)
9049871, e:\gh\coreclr2\src\jit\liveness.cpp, 1879, VarSetOps::IsSubset(this, keepAliveVars, life)
9026196, e:\gh\coreclr2\src\jit\flowgraph.cpp, 890, blockPred
8640521, e:\gh\coreclr2\src\jit\emitxarch.cpp, 4861, emitVerifyEncodable(ins, size, ireg)
8540880, e:\gh\coreclr2\src\jit\codegencommon.cpp, 2423, rv1 || mul != 1
8540880, e:\gh\coreclr2\src\jit\codegencommon.cpp, 2425, FitsIn<INT32>(cns)
7375190, e:\gh\coreclr2\src\jit\morph.cpp, 15754, fgExpandInline == false
7352623, e:\gh\coreclr2\src\jit\emitxarch.cpp, 3735, emitVerifyEncodable(ins, size, reg1, reg2)
7322925, e:\gh\coreclr2\src\jit\morph.cpp, 10675, op1 == tree->gtOp.gtOp1
7282409, e:\gh\coreclr2\src\jit\morph.cpp, 8054, call->gtOper == GT_CALL
7028316, e:\gh\coreclr2\src\jit\lclvars.cpp, 3363, varDsc->lvRefCnt > 0
6854055, e:\gh\coreclr2\src\jit\flowgraph.cpp, 11017, (block->bbFlags & BBF_REMOVED) == 0
6777828, e:\gh\coreclr2\src\jit\inlinepolicy.cpp, 497, smOpcode < SM_COUNT
6777827, e:\gh\coreclr2\src\jit\inlinepolicy.cpp, 498, smOpcode != SM_PREFIX_N
6480615, e:\gh\coreclr2\src\jit\flowgraph.cpp, 17102, blk != nullptr
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13703, GT_SUB == GT_ADD + 1
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13704, GT_MUL == GT_ADD + 2
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13705, GT_DIV == GT_ADD + 3
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13706, GT_MOD == GT_ADD + 4
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13716, GT_RSZ == GT_ADD + 12
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13715, GT_RSH == GT_ADD + 11
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13714, GT_LSH == GT_ADD + 10
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13712, GT_AND == GT_ADD + 9
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13711, GT_XOR == GT_ADD + 8
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13707, GT_UDIV == GT_ADD + 5
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13708, GT_UMOD == GT_ADD + 6
6264602, e:\gh\coreclr2\src\jit\morph.cpp, 13710, GT_OR == GT_ADD + 7
6262994, e:\gh\coreclr2\src\jit\liveness.cpp, 3003, compCurBB == block
6062460, e:\gh\coreclr2\src\jit\flowgraph.cpp, 7343, tree->OperGet() == GT_ASG
6028687, e:\gh\coreclr2\src\jit\codegencommon.cpp, 11901, jitGetILoffs(offsx) <= compiler->info.compILCodeSize
5280889, e:\gh\coreclr2\src\jit\regalloc.cpp, 6747, varDsc->lvIsInReg() || varDsc->lvOnFrame || varDsc->lvRefCnt == 0
5280889, e:\gh\coreclr2\src\jit\regalloc.cpp, 6751, !varDsc->lvRegister || !varDsc->lvOnFrame
5280889, e:\gh\coreclr2\src\jit\lclvars.cpp, 4567, !varDsc->lvFramePointerBased || codeGen->doubleAlignOrFramePointerUsed()
5223565, e:\gh\coreclr2\src\jit\flowgraph.cpp, 1713, fgDomsComputed
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 1833, endNode || (startNode == compCurStmt->gtStmt.gtStmtExpr)
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 2945, nextStmt->gtOper == GT_STMT
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 2944, nextStmt
4989227, e:\gh\coreclr2\src\jit\liveness.cpp, 1832, compCurStmt->gtOper == GT_STMT
4798987, e:\gh\coreclr2\src\jit\assertionprop.cpp, 3931, !optLocalAssertionProp
I measured the total impact of noway_assert using instruction counts over SuperPMI collections of the dotnet/coreclr testbed, by removing noway_assert from release build. I saw a 1.03% overhead from noway_assert using normal optimization, and a 0.74% overhead from noway_assert using MinOpts.
The top 11 noway_asserts here are over 50% of the dynamic count, so converting these few to simple assert I would expect to see up to 0.5% throughput improvement.
What's the story of this noway_assert
thing anyway? How was decided where to use assert
and where to use noway_assert
?
As I understand it, assert
was converted to noway_assert
automatically or semi-automatically. I actually don't know how they determined which were to be converted. Theoretically, we should not have any noway_assert unless a re-compilation with MinOpts would avoid repeating the condition (since that's the main benefit of noway_assert). But that's hard to tell sometimes. So we have to make a call about which ones are worth it, and which are too expensive (now, after the automated conversion was done).
Considering this is for throughput, it would be interesting to investigate this more.
The JIT has about 3300 noway_assert. These are executed in non-DEBUG (aka, RELEASE) builds. Some might be frequently executed and thus costly. Instead of auditing all of them for relevance (i.e., in an optimization phase that can be backed out of), or apparently importance, we could change the noway_assert macro (conditionally) to collect a count of which ones are frequently executed, using a hash table from preprocessor FILE and LINE to execution count, dumped at the end of compilation. Then, we could convert the worst ones to simple asserts.
category:throughput theme:throughput skill-level:expert cost:medium