Open J-cztery opened 6 years ago
Could you try adding -jump-tables=none
option to BOLT?
With jump-tables=none i get:
BOLT is unable to proceed because it couldn't properly understand this function.
If you are running the most recent version of BOLT, you may want to report this and paste this dump.
Please check that there is no sensitive contents being shared in this dump.
Not sure how to check that there is no sensitive contents in the dump...
I may have a new version for you to try soon. Meanwhile, could you add -relocs=0
and remove -reorder-functions=hfsort+
and see if it helps?
Great! I was able to bolt this binary with -relocs=0 before but i saw no improvements and I understood it might give some speedups with relocs enabled. So i wanted to give it a go, even though i know this piece of code is bandwidth memory bound. But that is the only think that I can make no-PIE/PIC.
Unless the application is bound by a CPU front end (I$, iTLB), we typically don't expect noticeable gains. Sometimes you can get lucky with code layout that affects the BTB hardware. Macro-fusion alignment might help if the original code was badly aligned.
Once we add full PIC/PIE support you can try BOLT on the rest of the code.
My code is not front end bound, however i do see a signifficant number of stalls caused by ICache misses and I page walks. Yeah. Let me know when you have something that i could try. Thanks.
A binary compiled on Intel Compiler with -ffreestanding to get rid of __intel memcpy replacement.
build/bin/llvm-bolt ./prog -o prog.bolt -data=./perf.fdata -report-stale -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats