Some alignment differences with gcc 4.8

Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Some alignment differences with gcc 4.8 #18005

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago


Bugzilla Link	PR18006
Status	NEW
Importance	P normal
Reported by	Rafael Ávila de Espíndola (rafael@espindo.la)
Reported on	2013-11-20 10:13:03 -0800
Last modified on	2014-01-07 10:25:04 -0800
Version	unspecified
Hardware	PC All
CC	benny.kra@gmail.com, llvm-bugs@lists.llvm.org, rnk@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

It looks like gcc 4.8 will

1) Align arrays >= 32 bytes to 32 bytes.
2) Align arrays >= 16 bytes to 16 bytes.
3) Align the stack to 16 bytes with -m32.
4) Align the stack to 32 bytes with -m64.
5) Align vlas to 16 bytes.

Item 5 is almost certainly a bug in gcc. We implement 3 and 2 (on 64 bits), but
not 1 or 4.

Quuxplusone commented 10 years ago

About the stack alignment:

I only noticed that given

void g(char *);
void f(int n) {
  char v;
  g(&v);
}

gcc will create a "  subq    $24, %rsp", indicating that it is trying to keep
the stack 32 bytes aligned, but I don't know of any intrinsic that would
require a 32 byte alignment to test if gcc assumes such alignment.

Quuxplusone commented 10 years ago

I suspect that after a few releases they will start assuming the stack is 32-byte aligned, just like they assume it is 16-byte aligned with -m32 today, despite the fact that the SysV ABI doesn't guarantee that. =P

Quuxplusone commented 10 years ago

Testing with

void f(char*);
void g(void) {
  char v[32];
  f(v);
}

shows that gcc produces a "subq    $40, %rsp" and no realignment, so it is only
keeping the stack object aligned to 16 bytes.

The differences we have compared to gcc are only for static data (both 32 and
64 bit modes): Objects >= 32 bytes are aligned to 32 bytes. This is true for
both arrays and structs. For example 'x' is 32 byte aligned in

struct foo {
  uint64_t a;
  uint64_t b;
  uint64_t c;
  uint64_t d;
};
foo x;

While gcc doesn't implement it, it would probably also be profitable (according
to the intel optimization manual Rule 46 and Rule 75) to align all static
objects >= 64 bytes to 64 bytes and smaller object to the previous power of two.