Some 2-vector vector select identity shuffles may be better represented as moves


Bugzilla Link	PR41433
Status	NEW
Importance	P enhancement
Reported by	Roman Lebedev (lebedev.ri@gmail.com)
Reported on	2019-04-08 13:41:12 -0700
Last modified on	2019-04-08 14:31:42 -0700
Version	trunk
Hardware	PC Linux
CC	craig.topper@gmail.com, daan@dsprenkels.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR41429, PR39603

Split off from https://bugs.llvm.org/show_bug.cgi?id=41429

https://godbolt.org/z/_n1ggH

void example(__m256i * __restrict__ dest, const __m256i * __restrict__ a) {
    (*dest)[2] = (*a)[2];
    (*dest)[3] = (*a)[3];
}

Here we do not ever touch the low half of dest, and replace the high part
of dest with high part of `a`.

The naive asm could be:

  vmovaps ymm0, ymmword ptr [rdi]
  vblendps ymm0, ymm0, ymmword ptr [rsi], 240 # ymm0 = ymm0[0,1,2,3],mem[4,5,6,7]
  vmovaps ymmword ptr [rdi], ymm0
  vzeroupper

But we can also produce:

        vmovaps xmm0, xmmword ptr [rsi + 16]
        vmovaps xmmword ptr [rdi + 16], xmm0

I'm not quite sure what are the exact criteria when that is profitable to do
though.

Quuxplusone / LLVMBugzillaTest

Some 2-vector vector select identity shuffles may be better represented as moves #40403