Open Quuxplusone opened 7 years ago
Bugzilla Link | PR33362 |
Status | NEW |
Importance | P enhancement |
Reported by | Simon Pilgrim (llvm-dev@redking.me.uk) |
Reported on | 2017-06-08 09:19:16 -0700 |
Last modified on | 2017-09-12 05:57:55 -0700 |
Version | trunk |
Hardware | PC Windows NT |
CC | andrea.dibiagio@gmail.com, ditaliano@apple.com, filcab@gmail.com, jatin.bhateja@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@ndave.org, marina.yatsina@intel.com, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Analysis :
1/
PRE-RA Instruction Scheduling algorithm which scans instruction bottom up by
default picks instruction as per source order from the available instructions
to be scheduled in next cycle. Of course instructions are available in the
priority queue only after they have satisfied precedence and resource
constraints.
Thus LZCNT64rr <second_argument> which corresponds to following IR
%4 = tail call i64 @llvm.ctlz.i64(i64 %1, i1 false) <-- SWAP
is scheduled later (remember algorithm works bottom-up so final schedule is
reverse order of instructions which algo produces.)
2/ Peephole optimizer which kicks in later down the pipeline could fold the
TEST with CMPNE to produce CMPAE in first case (clz_i128) as LZCNT operating on
first argument was adjacent to TEST instruction and both are EFLAGS reg
defining MIs.
3/ In second function(clz_i128_swap) due to swapping of llvm.ctlz LZCNT
operating on first argument of function is no longer adjacent to TEST due to
reason mentioned in Point 1/. Which is why flag floding could not happen.
I just modified the heuristic by taking into consideration already scheduled instruction which share their operands with the next instruction to be chosen from from the available queue under option -pre-RA-sched=guided-src.
This case works, please share your thoughts on this and what could be a better heuristic.
Thanks.