Open Quuxplusone opened 5 years ago
Bugzilla Link | PR42410 |
Status | NEW |
Importance | P enhancement |
Reported by | David Bolvansky (david.bolvansky@gmail.com) |
Reported on | 2019-06-26 11:24:30 -0700 |
Last modified on | 2019-06-26 12:16:24 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | craig.topper@gmail.com, hideki.saito@intel.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
foo(char*, char*): # @foo(char*, char*)
xor eax, eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
vmovq xmm0, qword ptr [rsi + rax] # xmm0 = mem[0],zero
vmovq xmm1, qword ptr [rdi + rax] # xmm1 = mem[0],zero
vpaddb xmm0, xmm1, xmm0
vmovq qword ptr [rdi + rax], xmm0
add rax, 8
cmp rax, 8
jne .LBB0_1
ret
-O3 -march=skylake -fno-unroll-loops helps a bit, but should not we rather
vectorize it than unroll it? And this vectorized code is not ideal too.