[x86, SSE] high elements of vectors are not ignored as expected

Quuxplusone commented 9 years ago


Bugzilla Link	PR24523
Status	NEW
Importance	P normal
Reported by	Sanjay Patel (spatel+llvm@rotateright.com)
Reported on	2015-08-20 12:27:45 -0700
Last modified on	2020-03-27 08:07:00 -0700
Version	trunk
Hardware	PC All
CC	craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR24475

Noticed in bug 24475: we're not eliminating meaningless ops on the higher
elements of vectors.

$ cat isnan.ll
define float @my_sse_isnan(float %f1) {
  %v1 = insertelement <4 x float> undef, float %f1, i32 0

  ; this shouldn't affect anything; we only care about the low (scalar) element
  %v2 = insertelement <4 x float> %v1, float 0.0, i32 1

  %cmpss = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %v2, <4 x float> %v2, i8 3)
  %ext = extractelement <4 x float> %cmpss, i32 0
  ret float %ext
}

$ ./llc -o - isnan.ll
...
  xorps %xmm1, %xmm1
  movss %xmm0, %xmm1            ## xmm1 = xmm0[0],xmm1[1,2,3]
  cmpunordss    %xmm1, %xmm1
  movaps    %xmm1, %xmm0
  retq

I was expecting that to be reduced to:

  cmpunordss %xmm0, %xmm0
  retq

Quuxplusone commented 8 years ago

I looked at adding this to the SSE scalar intrinsics handling in
SimplifyDemandedVectorElts, for the ord/unord comparison cases we can add
support quite easily - the other cases would need more care due to SSE's quirky
NAN handling.

With a local patch, opt can generate this:

define float @my_sse_isnan(float %f1) {
  %1 = fcmp uno float %f1, 0.0
  %2 = sext i1 %1 to i32
  %3 = bitcast i32 %2 to float
  ret float %3
}

which results in the assembly:

    xorl    %eax, %eax
    ucomiss %xmm0, %xmm0
    movl    $-1, %ecx
    cmovnpl %eax, %ecx
    movd    %ecx, %xmm0
    retq

Which is another can of worms....

Part of the problem is the fact that the original test reuses the same variable
in both arguments of the comparison, which stops SimplifyDemandedVectorElts
doing almost anything - we could fix this part of the bug very easily for cases
with different arguments.

I'm not sure which would be less work - adding basic support to
SimplifyDemandedVectorElts for values reused in an instruction or better
ucomiss/cmpss selection.

Quuxplusone commented 8 years ago

D17490 begins adding cmpss/cmpsd support to SimplifyDemandedVectorElts

Quuxplusone commented 4 years ago

4 years later...

Candidate patch: https://reviews.llvm.org/D76928

Quuxplusone / LLVMBugzillaTest

[x86, SSE] high elements of vectors are not ignored as expected #24522