Open Quuxplusone opened 6 years ago
Attached b.ll
(811 bytes, text/plain): Test
Generated assembly:
Min1:
movdqa %xmm0, %xmm2
movdqa %xmm1, %xmm0
pcmpgtq %xmm2, %xmm0
blendvpd %xmm0, %xmm2, %xmm1
movapd %xmm1, %xmm0
retq
Min2:
movdqa %xmm0, %xmm2
pcmpgtq %xmm1, %xmm0
blendvpd %xmm0, %xmm1, %xmm2
movapd %xmm2, %xmm0
retq
We're constrained quite a bit by the ABI and available instructions there. a will always have to start in xmm0 and b will always have to start in xmm1. The return value must go into xmm0. The blendvpd instructions always uses xmm0 for the blend control. It's not an encodable field of the instruction's binary representation. There is no pcmplt instruction we're forced to use pcmpgt.
The SelectionDAG process is largely blissfully unaware of the register constraints. So we don't know there's value in exploiting the equality case. We won't realize the problem until register allocation , but by then its too late to rewrite all the instructions.
It doesn't solve the problem in general, but we could argue that the reduced
case is a missed IR canonicalization for commutative ops.
Take a simpler pair of examples:
define i32 @binop_commute_1(i32 %a, i32 %b) {
%r = mul i32 %a, %b
ret i32 %r
}
define float @binop_commute_2(float %a, float %b) {
%r = fmul float %b, %a
ret float %r
}
InstCombine doesn't do anything with these, but -reassociate does...and for
some reason with split personality:
$ ./opt -reassociate min.ll -S
define i32 @binop_commute_1(i32 %a, i32 %b) {
%r = mul i32 %b, %a
ret i32 %r
}
define float @binop_commute_2(float %a, float %b) {
%r = fmul float %a, %b
ret float %r
}
This reminds me of [Bug #27780] where we need to invert the pblendvb blend mask to allow us to commute the blend inputs.
Is there any way that we can safely adjust multiple instructions through X86InstrInfo::commuteInstructionImpl to perform the commutations? Altering the comparison in this bug, or the constant pool blend mask in PR27780.
b.ll
(811 bytes, text/plain)