Open ArsenyBochkarev opened 1 month ago
High-level overview
Example for the simplest idea: always guess NextPC = PC + 4
. To support it, compiler needs maximize the chances that the next sequential instruction is the next instruction to be executed:
A Branch Target Buffer (BTB) -- Branch Target Address Cache can be implemented.
Two state machines for BPU are presented: Last-time prediction and 2-bit counter (change prediction after 2 consecutive mistakes)
Other approaches are
True
/False
for all branches (Global History Register or GHR), use it to guess the next branch (if it contains some of the patterns seen before -- patterns are in Pattern History Table or PHT). The 2-bit counters can be used. The further improvement is to detect biased branches (i.e. 99% of which are taken) and predict them with a simplier predictorA combined predictor can be implemented.
State-of-the-art tendencies in branch-prediction:
Issues in fast and/or wide instruction fetch engines
A lot of approaches are described
None for now (as we're just testing out the flow)
See some guidelines here. I'll list some below.
Utilize the compilers flags and attributes:
-align-all-blocks=<uint>
Force the alignment of all blocks in the function.-align-all-functions=<uint>
Force the alignment of all functions.-align-all-nofallthru-blocks=<uint>
Force the alignment of all blocks that have no fall-through predecessors (i.e. don't add nops that are executed).Use the compiler/language features for tuning:
likely
/unlikely
attribute specifiersnoinline
/always_inline
attributeshot
/cold
attributes#pragma unroll <N>
directiveN.B.The inline
function specifier does not guarantee that the function will actually be inlined (see here).
See the investigation pipeline for relocation relaxation elimination (aka short jumps elimination):
Stackoverflow question AND this presentation -> fixupNeedsRelaxationAdvanced
from LLVM backend -> RISCVAsmBackend::getRelaxedOpcode
takes the PseudoLongBxx
-> RISCVMCCodeEmitter::expandLongCondBr
expands the PseudoLongBxx
to an inverted conditional branch and an unconditional jump
-riscv-asm-relax-branches
or force disabling it any other waySee:
List below all your findings on CPU's branch prediction unit: inner structure, algorithms, code snippets for testing, projects, etc.