Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Signed division with power of two - eliminate test instruction #43316

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR44346
Status NEW
Importance P enhancement
Reported by David Bolvansky (david.bolvansky@gmail.com)
Reported on 2019-12-19 13:29:10 -0800
Last modified on 2021-10-23 07:24:42 -0700
Version trunk
Hardware PC Linux
CC craig.topper@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, matze@braunis.de, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
After PR43197, we have a better codegen for signed division with power of two,
but it could be improved a bit.

int al2(int x) {
     return ((x + 3 )/4) *4;
}

We have:
al2(int):                                # @al2(int)
        lea     eax, [rdi + 3]
        add     edi, 6
        test    eax, eax
        cmovs   eax, edi
        and     eax, -4
        ret

But test instruction could be eliminated, like GCC does:
al2(int):
  lea eax, [rdi+6]
  add edi, 3
  cmovns eax, edi
  and eax, -4
  ret

https://godbolt.org/z/C9ij3Y
Quuxplusone commented 4 years ago
We are again not reusing EFLAGS i guess.
Not a first time this comes up, i wonder if some generic handling is missing?
Quuxplusone commented 4 years ago

Flag reuse is sort of a mess. It's split between X86ISelLowering.cpp's EmitTest in SelectionDAG and X86InstrInfo's analyzeCompare for MIR.

EmitTEST will only make a transformation if the add is used by a store. Or the only user of its data output is the cmp/test. We fail that check in this case. This is probably largely to avoid tying things too closely together that would prevent the DAG from being linearized without replicating nodes. This could happen if the say the cmov true/false values depended on a different add that use the result of the add we wanted the flags from. In that case we wouldn't able to get the flags across the other add.

So then we emit a TEST from ISel. We try to optimize it in analyzeCompare, but its not equiped to do any instruction reordering so we miss out because of the order the nodes were emitted into MIR when the SelectionDAG was linearized.