Compiler-assisted-fuzzing / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
0 stars 0 forks source link

BPU research #1

Open ArsenyBochkarev opened 1 month ago

ArsenyBochkarev commented 1 month ago

List below all your findings on CPU's branch prediction unit: inner structure, algorithms, code snippets for testing, projects, etc.

ArsenyBochkarev commented 1 month ago

BPU structure

High-level overview

Advanced Branch Prediction presentation

Example for the simplest idea: always guess NextPC = PC + 4. To support it, compiler needs maximize the chances that the next sequential instruction is the next instruction to be executed:

  1. Lay out the control flow graph such that the “likely next instruction” is on the not-taken path of a branch
  2. Get rid of unnecessary control flow instructions by combining predicates (predicate combining)
  3. Convert control dependences into data dependences (predicated execution)

A Branch Target Buffer (BTB) -- Branch Target Address Cache can be implemented.

Two state machines for BPU are presented: Last-time prediction and 2-bit counter (change prediction after 2 consecutive mistakes)

Other approaches are

A combined predictor can be implemented.

State-of-the-art tendencies in branch-prediction:

Issues in fast and/or wide instruction fetch engines

Wikipedia

A lot of approaches are described

Projects

None for now (as we're just testing out the flow)

Code snippets

List of found BPU benchmarks

Additional performance tuning

See some guidelines here. I'll list some below.

Utilize the compilers flags and attributes:

Use the compiler/language features for tuning:

N.B.The inline function specifier does not guarantee that the function will actually be inlined (see here).

LLVM

See the investigation pipeline for relocation relaxation elimination (aka short jumps elimination): Stackoverflow question AND this presentation -> fixupNeedsRelaxationAdvanced from LLVM backend -> RISCVAsmBackend::getRelaxedOpcode takes the PseudoLongBxx -> RISCVMCCodeEmitter::expandLongCondBr expands the PseudoLongBxx to an inverted conditional branch and an unconditional jump

See: