LADSoft / OrangeC

OrangeC Compiler And Tool Chain
http://ladsoft.tripod.com/orange_c_compiler.html
Other
285 stars 39 forks source link

Optimizer - optimize branches for branch prediction #254

Open LADSoft opened 5 years ago

LADSoft commented 5 years ago

minimize branches and generally ease the way for branch prediction in the later pentiums to result in faster code...

chuggafan commented 5 years ago

I've dug up the old Intel Pentium II optimization guideline pdf on the internet here which uses the same exact example code for branch prediction as the new manual, for the new manual, the pages that directly say what we want start Here, old manual on branch prediction is Here, and the old manual's method to predict is Here,

So, the rules that intel gives on Branch Prediction basically boil down to:

  1. Keep code and data on separate pages
  2. eliminate branches where possible
  3. arrange code to be consistent with the Static Branch Prediction Algorithm (jumping forward is assumed not taken, jumping backwards is assumed taken, at first encounter of the jump)
  4. inline functions and pair up the calls and returns
  5. unroll loops to keep the loops having sixteen or less iterations (unless this causes large code increases)
  6. Avoid two conditional branch instructions in a loop so both have the same target address and, at the same time, belong to the same 16 byte aligned code block (this isn't an optimization thing in the old book as I do not believe they are 16 byte aligned on the Pentium II, however, the new manual is for anything the Pentium 4 and above)

On Intel's new manuals, they have a system of showing Compiler/Assembler authors what each optimization's impact is and how general the optimization is, which should be helpful in deciding the priority to implement anything in the optimizer at all.

GitMensch commented 5 years ago

Hm, 2-6 would only be done if optimizations are on, correct? Do you know if the rule 3-6 are true for AMD, too?

chuggafan commented 5 years ago

https://www.amd.com/system/files/TechDocs/25112.PDF#page=142

It appears rules 3-6 are all true as well, with AMD being more specific about inline size (25 instructions), 3 jmps in a 16 byte block, unroll loops, arrange to be consistent with BPA, with a few more caveats that I currently do not have the time to search for and display out in a nice list.

chuggafan commented 5 years ago

https://www.agner.org/optimize/microarchitecture.pdf Agner seems to have compiled a good list of everything from P1 up.

I also thought it through and that unless we have indicators in code or PGO (profile guided optimization) such a feature like this is difficult to make, better branch folding however, is an optimization that is possible to do on regular code.

GitMensch commented 3 years ago

Please allow the user to override prediction by likely/unlikely, see https://stackoverflow.com/questions/109710.

LADSoft commented 3 years ago

@GitMensch yeah I think that was coming in the next milestone. C++2020 introduces those as attributes and I presume GNU C++ has attribute for it as well.

referring to @chuggafan's comment in 2018 there are possibly one or two cases where the compiler could assume the likeliness or unlikeliness of branches based on whether something like an 'if' statement exits a loop (via break or goto), for example... but it would largely be up to the user to specify in the absence of something like PGO...

chuggafan commented 3 years ago

Yes, IMHO the best way to perform branch prediction in absence of PGO is [[likely]] and [[unlikely]], depending on the arch/processor we're compiling for we can actually give it hints about how the branches will go, such as on x86, but those hints are ignored in the modern day so... https://www.agner.org/optimize/optimizing_assembly.pdf

Also, for doing likely vs unlikely branches, if we have data giving us that something is more likely to be taken we make that the one that we don't branch into and the code naturally flows into, otherwise we just branch to that. e.g.

; do logic here
CMP eax, 1 ; is likely to be 0, not 1, on a boolean comparison...
JEQ exit
; do more logic here
exit:
RET

Essentially, what we really should be doing is block reordering so that the likelyhood of a block being executed determines the order from most likely to least likely.

LADSoft commented 3 years ago

yeah i figured that was going to be how it goes... this should be relatively simple to implement I think. I figured the one case we could probably guess at was when you break out of a loop at the end of an if statement... the body of that if statement is a good guess at an 'unlikely' case lol. I haven't been able to come up with other examples yet though...