bitwiseworks / gcc-os2

Port of GCC compiler to OS/2
GNU General Public License v2.0
16 stars 2 forks source link

Support C __aligned__ and C++ alignas attributes #11

Open dmik opened 4 years ago

dmik commented 4 years ago

There is a GCC __aligned__ attribute and its C++11 alignas counterpart. In short, they allow to specify the alignment (in bytes) for a variable or struct. This thing is basically ignored in our port of GCC now. I.e. given the following aligned.c:

char not_aligned_char;
int not_aligned_int;
char __attribute__ ((aligned (64))) aligned_char_at_64_B;
int __attribute__ ((aligned (128))) aligned_int_at_128_B;

the command gcc -S aligned.c will generate the following assembly:

    .file   "aligned.c"
    .text
    .comm   _not_aligned_char, 8    # 1
    .comm   _not_aligned_int, 16    # 4
    .comm   _aligned_char_at_64_B, 16   # 1
    .comm   _aligned_int_at_128_B, 16   # 4
    .ident  "GCC: (GNU) 9.2.0 20190812 (OS/2 RPM build 9.2.0-5.oc00)"

If we feed it to GCC under Linux, we will get this:

        .file   "aligned.c"
        .text
        .comm   not_aligned_char,1,1
        .comm   not_aligned_int,1,1
        .comm   aligned_char_at_64_B,1,64
        .comm   aligned_int_at_128_B,4,128
        .ident  "GCC: (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008"

You may notice that the .comm directive, along with the second argument which is the variable size, also gets a third argument which specifies its desired alignment from the aligned attribute.

When such an assembly is then linked on OS/2, variables remain unaligned (or, to be exact, aligned at some default alignment which doesn't match the request). As a result, some applications that require strict alignment (e.g. because they use SSE2 instructions that require 128 byte alignment) break.

The reason for that is that the object format used on OS/2 is a.out. And gas supports alignment only for elf and PE formats — simply because a.out is just very old and doesn't support per-symbol alignment specification. So generation of the third .comm option is disabled for it.

Besides, there is also a similar problem in the OMF object file format which all a.out files need to be converted to (by emxomf.exe) before they can be linked into an OS/2 (LX) executable. OMF only supports per-segment alignment which may be word (2 bytes), dword (4 bytes), para (16 bytes) and 64 KBytes. Such per-segment alignment doesn't allow to satisfy all possible alignment requests (especially those which are greater than 16 bytes) in an effective way. It could do so, if using 64K alignment for all segments, but this would be a big waste of program address space, especially in case of a large amount of translation units (object files). So it's not practically possible.

The only solution here is to use something else instead of a.out for object files. For example, the elf format. And then link it with some custom linker to an OS/2 executable. There are rumors that wlink from OpenWatcom (which we already use as a main, and the only supported linker for GCC on OS/2) can link elf object files into LX executables but this needs checking.

dmik commented 4 years ago

Note that for C++ the assembly is slightly different (it involves the usage of .space and .balign assembler directives) but it ends up similarly: neither a.out nor omf support alignment on a level that allows to preserve it in the final executable.

dmik commented 4 years ago

Just to stress it out, one of key motivations to have this fixed is SSE support. Currently, we have to disable SSE (or lower the compiler optimization level) in some cases as it generates MMX commands requiring 128 bit alignment of memory variables. This is one case: https://github.com/bitwiseworks/libc/issues/30#issuecomment-465235535. And disabling SSE/MMX obviously degrades performance.

Note that GCC 9 now has -mstackrealign default on OS/2 (https://github.com/bitwiseworks/gcc-os2/commit/4c1b7b3a1ac59dce7a31e10daf9267205e006880) but it only solves alignment problems for stack-based variables. However, there are cases when variables MMX operates on are located in the data segment (i.e. global/static). And such variables require this ticket to be fixed to get it working.

komh commented 4 years ago

Per-segment alignment is one of 1(byte), 2(word), 4(dword), 16(para) and 256(page, maybe 4k, but it was not possible). 64k is not in them.

And .balign of gas worked up to 16 bytes alignment.

dmik commented 4 years ago

@komh thanks! Any reference to where you got it from? I was just repeating somebody else's words, but I wonder if there is any official OMF specs. If 256 bytes alignment is really possible in OMF then it might be a solution, at least a temporary one. But the ELF feature of WLINK should be evaluated too. If it's there (or may be brought with some little effort), then making GCC produce ELF objects on OS/2 is also not a big deal.

dmik commented 4 years ago

I've been provided the OMF specs in PDF, attaching it here for reference: omf.pdf.

As the document states, the SEGDEF record supports 1, 2, 4, 16 and page alignment. The latter on x386+ is always 4K. Anyway, 4K is also way too much for alignments >16 bytes. It will cause too much memory waste. So not an option. ELF seems like the only one.

dryeo commented 4 years ago

The NASM docs (section 7.4) say this about ALIGN, though talking about segments but these extensions may be supported for regular ALIGN,

ALIGN is used, as shown above, to specify how many low bits of the 
segment start address must be
forced to zero. The alignment value given may be any power of two from 1 
to 4096; in reality, the
only values supported are 1, 2, 4, 16, 256 and 4096, so if 8 is 
specified it will be rounded up to 16, and
32, 64 and 128 will all be rounded up to 256, and so on. Note that 
alignment to 4096−byte boundaries
is a PharLap extension to the format and may not be supported by all 
linkers.
komh commented 4 years ago

@dmik I've seen it in 'Object Module Format Reference', 'ALP Programming Guide and Reference' of OS/2 Toolkit 4.5 and 'NASM manual'.

As you said, OMF spec says that segment is aligned on 4K byte boundary on 32bits platforms. I also thought so. But it is aligned differently according to linkers.

wl.exe: 256 link386.exe: 4096 ilink.exe: 4096

That is, WATCOM linker aligns on 256 byte boundary, but IBM linkers align on 4K byte boundary.

At first, I tested WATCOM linker only.

FYI, OS/2 ld aligns on 16 byte boundary all the time. It's possible to change the value because we have the sources. ^^

Nevertheless, if ELF is supported, it would be best.

dmik commented 4 years ago

Just for the record, if I get it right, we also need this task to be done to support things like AVX in FFMPEG (to not crash on OS4 kernels). See the above Chromium comment.

dmik commented 3 years ago

Also note http://trac.netlabs.org/ports/ticket/206. There is a link to an article about why -fno-common GCC option may help here. Needs checking.

dmik commented 3 years ago

BTW, there is a suggestion in http://trac.netlabs.org/ports/ticket/206 to use -fno-common to overcome alignment limitations wrt AVX and a claim it helps with FFmpeg. It might be the case there but it doesn't seem to help with the test case from the description of this issue. It appears that -fno-common simply disables grouping the global variables in COMM segment. Note also that when -fno-common is used, the assembly is identical to C++ (which therefore doesn't seem to use COMM by default).

This is the assembly with -fno-common (to contrast with -fcommon assembly above):

    .file   "aligned.c"
    .text
    .globl  _not_aligned_char
    .bss
_not_aligned_char:
    .space 1
    .globl  _not_aligned_int
    .balign 4
_not_aligned_int:
    .space 4
    .globl  _aligned_char_at_64_B
    .balign 64
_aligned_char_at_64_B:
    .space 1
    .globl  _aligned_int_at_128_B
    .balign 128
_aligned_int_at_128_B:
    .space 4
    .ident  "GCC: (GNU) 9.2.0 20190812 (OS/2 RPM build 9.2.0-5.oc00)"

This is how the DATA group's object looks like in C mode and no special options (i.e. -fcommon is assumed on our platform):

________DATA           DATA           DGROUP         0002:00000190   00000000
BSS32                  BSS            DGROUP         0002:00000190   00000030
c_common               BSS            DGROUP         0002:000001c0   00000040
________BSS            BSS            DGROUP         0002:00000200   00000000

--- this comes from crt0.obj ---
0002:00000000* __data
0002:00000000* ___data_start
0002:00000190* ___bss_start
0002:00000130+ __os2dll
0002:00000138  ___CTOR_LIST__
0002:00000140  ___DTOR_LIST__
0002:00000148* ___crtinit1__
0002:00000150* ___crtexit1__
0002:00000158  ___eh_frame__
0002:00000168  ___eh_init__
0002:00000174  ___eh_term__
0002:00000180+ ___fork_parent1__
0002:00000188+ ___fork_child1__

--- this comes from aligned.obj ---
0002:000001c0+ _not_aligned_char
0002:000001c8+ _not_aligned_int
0002:000001d8+ _aligned_char_at_64_B
0002:000001e8+ _aligned_int_at_128_B

--- this comes from end.asm ---
0002:00000200  _end
0002:00000190  _edata
--- this comes from end.asm ---
0002:00000200  __end
0002:00000190  __edata

This is how it looks in C mode with -fno-common and in C++:

________DATA           DATA           DGROUP         0002:00000190   00000000
BSS32                  BSS            DGROUP         0002:00000190   00000130
c_common               BSS            DGROUP         0002:000002c0   00000000
________BSS            BSS            DGROUP         0002:000002c0   00000000

--- this comes from crt0.obj ---
0002:00000000* __data
0002:00000000* ___data_start
0002:00000190* ___bss_start
0002:00000130+ __os2dll
0002:00000138  ___CTOR_LIST__
0002:00000140  ___DTOR_LIST__
0002:00000148* ___crtinit1__
0002:00000150* ___crtexit1__
0002:00000158  ___eh_frame__
0002:00000168  ___eh_init__
0002:00000174  ___eh_term__
0002:00000180+ ___fork_parent1__
0002:00000188+ ___fork_child1__

--- this comes from aligned.obj ---
0002:00000190* _not_aligned_char
0002:00000194* _not_aligned_int
0002:000001d0* _aligned_char_at_64_B
0002:00000210* _aligned_int_at_128_B

--- this comes from endlink386.asm ---
0002:000002c0  _end
0002:00000190  _edata
--- this comes from end.asm ---
0002:000002c0  __end
0002:00000190  __edata

As one may see, -fno-common makes alignment work much better (there are proper gaps between variables according to their size alignment) but the fact that data sections from different object files (i.e. crt0.obj + aligned.obj) are glued up together without any space and alignment, we end up with wrong alignment. The first offset in aligned.obj is 190 instead of ...80 as expected by GCC (because the maximum requested alignment for this object file is 128 which is 80 in hex). So all alignment gets a 16 (10 in hex) bytes shift. Having it subtracted, we will get:

0002:00000180* _not_aligned_char
0002:00000184* _not_aligned_int
0002:000001c0* _aligned_char_at_64_B
0002:00000200* _aligned_int_at_128_B

which would be the exact requested (correct) alignment.

In case of -fcommon, alignment is totally ignored it seems. But as this GCC bug shows, this option is mainly there for backward compatibility with ancient systems and in GCC 10 they made -fno-common the default (which is better both in terms of functionality and performance). Once we update to it we will get it "for free".

Here is GCC docs for -fcommon, for reference.

So what we really need to fix here is the linker it seems (to obey maximum object file alignment and align it accordingly when gluing object files together). I guess that fixing EMXOMF and WL to do so is not a big issue per se but it might make the results not compatible with other linkers and OMF files generated by other compilers.

dmik commented 3 years ago

BTW, OMF format seems to fully support -fcommon semantics via COMDEF records. This is what I get for it in the object file (used listomf):

SEGDEF #1 "TEXT32"  PARA PUBLIC USE32 Length: 0 CLASS "CODE"
SEGDEF #2 "DATA32"  PARA PUBLIC USE32 Length: 0 CLASS "DATA"
SEGDEF #3 "BSS32"  PARA PUBLIC USE32 Length: 0 CLASS "BSS"
SEGDEF #4 "$$SYMBOLS"  PARA PUBLIC USE32 Length: 0x24 CLASS "DEBSYM"
SEGDEF #5 "$$TYPES"  PARA PUBLIC USE32 Length: 0 CLASS "DEBTYP"
GRPDEF #1 "FLAT":
GRPDEF #2 "DGROUP": "BSS32"(#3) "DATA32"(#2)
COMDEF
  #1: "_not_aligned_char", type index: 0, NEAR, 8 bytes
  #2: "_not_aligned_int", type index: 0, NEAR, 16 bytes
  #3: "_aligned_char_at_64_B", type index: 0, NEAR, 16 bytes
  #4: "_aligned_int_at_128_B", type index: 0, NEAR, 16 bytes

And this is what -fno-common produces:

SEGDEF #1 "TEXT32"  PARA PUBLIC USE32 Length: 0 CLASS "CODE"
SEGDEF #2 "DATA32"  PARA PUBLIC USE32 Length: 0 CLASS "DATA"
SEGDEF #3 "BSS32"  PARA PUBLIC USE32 Length: 0x100 CLASS "BSS"
SEGDEF #4 "$$SYMBOLS"  PARA PUBLIC USE32 Length: 0x24 CLASS "DEBSYM"
SEGDEF #5 "$$TYPES"  PARA PUBLIC USE32 Length: 0 CLASS "DEBTYP"
GRPDEF #1 "FLAT":
GRPDEF #2 "DGROUP": "BSS32"(#3) "DATA32"(#2)
PUBDEF base group: "FLAT"(#1), base seg: "BSS32"(#3)
  #1: "_not_aligned_char", offset: 0, type: 0
  #2: "_not_aligned_int", offset: 0x4, type: 0
  #3: "_aligned_char_at_64_B", offset: 0x40, type: 0
  #4: "_aligned_int_at_128_B", offset: 0x80, type: 0

As one may see, PUBDEF is much more accurate because it supports the offset field that can express both size and alignment.

However, EMXOMF sets the BSS segment alignment to 16 bytes (PARA) and this does not fit the 64 and 128 bytes alignment requirements when individual BSS segments are glued together. Using the page alignment here would satisfy it bug as mentioned above this would waste too many bytes when aligning. Perhaps we could effectively regroup PUBDEF records to minimize this waste (using different SEGDEF with different names and alignments) but this needs some thinking and it's a question if WL will support that well.

dmik commented 3 years ago

Also, it's clear now why -fno-common may help with FFMPEG/AVX. Given that it makes C/C++ alignment attributes respected and given that EMXOMF always uses 16-byte alignment for public variables in this mode, it is guaranteed that requested 128-bit (i.e. 16 byte) alignment will be provided. So, if AVX in FFMPEG doesn't use instructions using 256-bit (32 byte) alignment (which even -fno-common cannot guarantee because of EMXOMF limitations), this will work indeed.