Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[x86, SSE] high elements of vectors are not ignored as expected #24522

Open Quuxplusone opened 9 years ago

Quuxplusone commented 9 years ago
Bugzilla Link PR24523
Status NEW
Importance P normal
Reported by Sanjay Patel (spatel+llvm@rotateright.com)
Reported on 2015-08-20 12:27:45 -0700
Last modified on 2020-03-27 08:07:00 -0700
Version trunk
Hardware PC All
CC craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR24475
Noticed in bug 24475: we're not eliminating meaningless ops on the higher
elements of vectors.

$ cat isnan.ll
define float @my_sse_isnan(float %f1) {
  %v1 = insertelement <4 x float> undef, float %f1, i32 0

  ; this shouldn't affect anything; we only care about the low (scalar) element
  %v2 = insertelement <4 x float> %v1, float 0.0, i32 1

  %cmpss = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %v2, <4 x float> %v2, i8 3)
  %ext = extractelement <4 x float> %cmpss, i32 0
  ret float %ext
}

$ ./llc -o - isnan.ll
...
  xorps %xmm1, %xmm1
  movss %xmm0, %xmm1            ## xmm1 = xmm0[0],xmm1[1,2,3]
  cmpunordss    %xmm1, %xmm1
  movaps    %xmm1, %xmm0
  retq

I was expecting that to be reduced to:

  cmpunordss %xmm0, %xmm0
  retq
Quuxplusone commented 8 years ago
I looked at adding this to the SSE scalar intrinsics handling in
SimplifyDemandedVectorElts, for the ord/unord comparison cases we can add
support quite easily - the other cases would need more care due to SSE's quirky
NAN handling.

With a local patch, opt can generate this:

define float @my_sse_isnan(float %f1) {
  %1 = fcmp uno float %f1, 0.0
  %2 = sext i1 %1 to i32
  %3 = bitcast i32 %2 to float
  ret float %3
}

which results in the assembly:

    xorl    %eax, %eax
    ucomiss %xmm0, %xmm0
    movl    $-1, %ecx
    cmovnpl %eax, %ecx
    movd    %ecx, %xmm0
    retq

Which is another can of worms....

Part of the problem is the fact that the original test reuses the same variable
in both arguments of the comparison, which stops SimplifyDemandedVectorElts
doing almost anything - we could fix this part of the bug very easily for cases
with different arguments.

I'm not sure which would be less work - adding basic support to
SimplifyDemandedVectorElts for values reused in an instruction or better
ucomiss/cmpss selection.
Quuxplusone commented 8 years ago

D17490 begins adding cmpss/cmpsd support to SimplifyDemandedVectorElts

Quuxplusone commented 4 years ago

4 years later...

Candidate patch: https://reviews.llvm.org/D76928