Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Some alignment differences with gcc 4.8 #18005

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR18006
Status NEW
Importance P normal
Reported by Rafael Ávila de Espíndola (rafael@espindo.la)
Reported on 2013-11-20 10:13:03 -0800
Last modified on 2014-01-07 10:25:04 -0800
Version unspecified
Hardware PC All
CC benny.kra@gmail.com, llvm-bugs@lists.llvm.org, rnk@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
It looks like gcc 4.8 will

1) Align arrays >= 32 bytes to 32 bytes.
2) Align arrays >= 16 bytes to 16 bytes.
3) Align the stack to 16 bytes with -m32.
4) Align the stack to 32 bytes with -m64.
5) Align vlas to 16 bytes.

Item 5 is almost certainly a bug in gcc. We implement 3 and 2 (on 64 bits), but
not 1 or 4.
Quuxplusone commented 10 years ago
About the stack alignment:

I only noticed that given

void g(char *);
void f(int n) {
  char v;
  g(&v);
}

gcc will create a "  subq    $24, %rsp", indicating that it is trying to keep
the stack 32 bytes aligned, but I don't know of any intrinsic that would
require a 32 byte alignment to test if gcc assumes such alignment.
Quuxplusone commented 10 years ago

I suspect that after a few releases they will start assuming the stack is 32-byte aligned, just like they assume it is 16-byte aligned with -m32 today, despite the fact that the SysV ABI doesn't guarantee that. =P

Quuxplusone commented 10 years ago
Testing with

void f(char*);
void g(void) {
  char v[32];
  f(v);
}

shows that gcc produces a "subq    $40, %rsp" and no realignment, so it is only
keeping the stack object aligned to 16 bytes.

The differences we have compared to gcc are only for static data (both 32 and
64 bit modes): Objects >= 32 bytes are aligned to 32 bytes. This is true for
both arrays and structs. For example 'x' is 32 byte aligned in

struct foo {
  uint64_t a;
  uint64_t b;
  uint64_t c;
  uint64_t d;
};
foo x;

While gcc doesn't implement it, it would probably also be profitable (according
to the intel optimization manual Rule 46 and Rule 75) to align all static
objects >= 64 bytes to 64 bytes and smaller object to the previous power of two.