Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Spilled XMM is assumed wrongly to be aligned #14648

Open Quuxplusone opened 11 years ago

Quuxplusone commented 11 years ago
Bugzilla Link PR14646
Status NEW
Importance P normal
Reported by NAKAMURA Takumi (geek4civic@gmail.com)
Reported on 2012-12-19 01:08:28 -0800
Last modified on 2013-08-23 09:41:03 -0700
Version trunk
Hardware All All
CC anton@korobeynikov.info, babokin@gmail.com, boaz.ouriel@intel.com, gm4cheng@gmail.com, kevin.p.schoedel@intel.com, llvm-bugs@lists.llvm.org, michael.hliao@gmail.com, mkuper@google.com, nadav.rotem@me.com, nicolas.capens@gmail.com, rafael@espindo.la
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
With XMM, loading m in INSTrm should be 16-byte aligned.

        xorps   (%esp), %xmm0           # 16-byte Folded Reload

But This function does not have realignment in prologue. (%esp) might not be 16-
aligned.

It causes miscompilation with i686-cygwin vectorizer.

* Reproducible for i686-freebsd (also possible for netbsd, cygming and win32)
* It's leaf function. This issue could be suppressed if frame pointer were
generated.

/* testcase.c */
typedef long long W __attribute__((__vector_size__(16)));
W foo(W a0) {
  W r0;
  asm volatile("nop":"=x"(r0)::"%xmm1","%xmm2","%xmm3","xmm4","%xmm5","%xmm6","%xmm7");
  return a0 ^ r0;
}

# llc -mtriple=i686-freebsd -mattr=-avx
foo:                                    # @foo
# BB#0:                                 # %entry
        subl    $28, %esp
        movups  %xmm0, (%esp)           # 16-byte Folded Spill
        #APP
        nop
        #NO_APP
        xorps   (%esp), %xmm0           # 16-byte Folded Reload
        addl    $28, %esp
        ret
Quuxplusone commented 11 years ago

The X86 backend knows how to do stack realignment, it probably just thinks that freebsd has a 16-byte aligned stack (like linux).

-Chris

Quuxplusone commented 11 years ago

It is using movups for the spill, though, rather than movaps as i686-linux does.

Quuxplusone commented 11 years ago

I think the problem is that the reload and xor are being fused in X86InstrInfo::foldMemoryOperandImpl() without verifying and/or specifying the alignment that would be required.

Quuxplusone commented 11 years ago

_Bug 16126 has been marked as a duplicate of this bug._

Quuxplusone commented 11 years ago
The allocated stack frame slot is marked as 16-byte aligned in my case (while
stack is 4 byte aligned):

# *** IR Dump After Virtual Register Rewriter ***:
# Machine code for function f_fu: Post SSA
Frame Objects:
  fi#-3: size=4, align=4, fixed, at location [SP+12]
  fi#-2: size=4, align=4, fixed, at location [SP+8]
  fi#-1: size=4, align=4, fixed, at location [SP+4]
  fi#0: size=16, align=16, at location [SP+4] <<<<<<<<<<<<<<<< here's it.

So it's probably not X86InstrInfo::foldMemoryOperandImpl(), which needs to be
fixed, but slot needs to be allocated at aligned boundary as declared.

Or if X86InstrInfo::foldMemoryOperandImpl() needs to be fix, then also slot
needs to be created as 4 bytes aligned.