freemint / m68k-atari-mint-gcc

Fork of GNU's gcc with support for the m68k-atari-mint target
https://github.com/freemint/m68k-atari-mint-gcc/wiki
Other
27 stars 7 forks source link

PCC_BITFIELD_TYPE_MATTERS: bitfields layout #35

Closed vinriviere closed 10 months ago

vinriviere commented 1 year ago

I open this issue to clarify the effects of PCC_BITFIELD_TYPE_MATTERS. On m68k, this is related to STRUCTURE_SIZE_BOUNDARY discussed in https://github.com/freemint/m68k-atari-mint-gcc/issues/21.

PCC_BITFIELD_TYPE_MATTERS is about the layout of bitfields inside structs.

For memory, general usage of bitfields is like this:

struct s
{
        int myfield:3;
};

This defines a field named myfield on 3 bits. So it is an integer which can hold values from 0 to 7. Multiple bitfields can be combined, so they take less space than regular integer members.

That being said, here is the official GCC documentation about PCC_BITFIELD_TYPE_MATTERS. https://gcc.gnu.org/onlinedocs/gccint/Storage-Layout.html#index-PCC_005fBITFIELD_005fTYPE_005fMATTERS (currently down, alternate link).

So the main point is: Does the integer type used in the int myfield:3; expression matters? Do we get the same result with char, short, int and long?

Regarding to compilers:

As that documentation isn't crystal clear, let's make experiments. I did a few, and the results are quite puzzling. So let's advance step by step.

The first thing to understand is that there is 2 kinds of bitfields statements, with different effects, and even layouts:

Strange thing is that the sample in the documentation focuses on anonymous zero-sized bitfields. So I start with that (as my early tests shown more oddities with regular bitfields).

Here is a good testcase: bitf.c

struct s
{
    char a;
    char :0;
    char b;
} g;

int size = sizeof(struct s);

void f(void)
{
    g.a = -1;
    g.b = -1;
}

Data fields are named a and b. As they are of type char and value is -1, on m68k gcc uses the very simple st instruction to set them. So it's easy to check alignment.

Key point is to see if the type char used in char :0 has effect on the alignment on next field, namely b. I found an interesting quote here:

As a special case, an unnamed bit-field with a width of zero specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed bit-field may the constant-expression be a value equal to zero.

I understand that an "allocation unit" is the type used before the colon in int myfield:3. The above quote tells about "alignment of the next bit-field", but the documentation example just focus on "alignment of next field (not necessarily a bit field)". So I do.

First I test with the native x86_64-linux-gnu-gcc (synonym of just "gcc"), as reference compiler.

x86_64-linux-gnu-gcc -S bitf.c -o - -Os
Size: 2
    movw    $-1, g(%rip)

Note that movw fills 2 bytes. So there is no padding, no further alignment. As expected, char :0; does nothing, as it aligns b to a char boundary.

I will change the type of char :0; to other integer type and see the effects. To keep this text short, here is a summary:

char :0; aligns next field on 1-byte boundary (does nothing) for a total size of 2 short :0; aligns next field on 2-byte boundary (with 1 filler char) for a total size of 3 int :0; aligns next field on 4-byte boundary (with 3 filler chars) for a total size of 5 long :0; aligns next field on 8-byte boundary (with 7 filler chars) for a total size of 9

So definitely, for x86_64, the bitfield type matters. And it works as expected. With x86_64-linux-gnu-gcc -m32 I get similar result, except that long is a synonym to int, as expected for 32-bit.

Next I test with m68k-elf-gcc -malign-int to see what happens on m68k with natural alignment.

m68k-elf-gcc -S bitf.c -o - -Os -fomit-frame-pointer -malign-int

With char: size=2
    st g
    st g+1
With short: size=3
    st g
    st g+2
With long: size=5
    st g
    st g+4

So the results are the same than x86 32/34 gcc. That's a good thing.

Now if I compile without -malign-int:

m68k-elf-gcc -S bitf.c -o - -Os -fomit-frame-pointer

With char: size=2
    st g
    st g+1
With short: size=3
    st g
    st g+2
With int: size=3
    st g
    st g+2
With long: size=3
    st g
    st g+2

So we can see that with the default m68k-elf setting, without -malign-int, bitfield types short/int/long have the very same effect. While only the char type makes a difference. I guess this is because BIGGEST_ALIGNMENT is set to 16-bit by default. So we mustn't be fooled: it is normal to see nothing aligned on 4-byte boundaries, unless -malign-int is used (and that's uncommon in the real world).

Then: same tests with m68k-linux and m68k-atari-mint, which are both configured without PCC_BITFIELD_TYPE_MATTERS. I tested all types, with or without -malign-int. The results are always the same:

m68k-linux-gcc -S bitf.c -o - -Os -fomit-frame-pointer
With any type: size=4
    st g
    st g+2

So definitely, with m68k-linux and m68k-atari-mint, bitfield type doesn't matter. At least, regarding to alignment zero-sized bitfields always behave as if the used type was "short". Also, note that size=4. Due to STRUCTURE_SIZE_BOUNDARY=16, the struct is padded to even size.

Next tests tomorrow, with real bitfields.

vinriviere commented 11 months ago

Next round: real bitfields. First, a quick reminder with this small testcase:

struct s
{
    short a:1;
    short b:1;
} g;

int size = sizeof(struct s);

void f1(void)
{
    g.a = -1;
}

void f2(void)
{
    g.b = -1;
}

Now the big question, main point of this issue: does the bitfield type matter? In other words, do short a:1; and char a:1; have the same effect?

Let's have a look at the size of the struct, with the above example.

But something weird with those compilers:

    short a:1;
    short b:1;
    char z;

The struct size is still 2! Because as the 2 bits of the bitfields occupy less than a byte, there is enough room in the second half of the short to store next field, namely char z. It is interesting to see that gcc merges the next normal field into the previous bitfield, if there is enough room.

If I use long instead of short, the result is still the same: still 2 bytes. Unless I use -malign-int: in that case, size is 4.

So "bitfield type matters"... only between char and short. Because our maximum alignment is short (unless the uncommon -malign-int option is used).

Now I remove that char z extra field.

With m68k-atari-mint, the result is always 2. So the bitfield type doesn't matter, as it seems to be always considered as short.

But with m68k-linux, it's rather strange, because the size is always 1, whenever the type is char, short, or long. This is probably related to the specific case of m68k-linux not requiring string alignment.

[To be continued]

vinriviere commented 11 months ago

A few remarkable facts, tested with m68k-elf-gcc:

1) A bitfield can also be merged with previous field, if there is enough room. Example:

        char a;
        short z:1;

Here, size=2. Layout is 8-bit a, 1-bit z, 7-bit filler. This layout respects the rules: z is after a, and z resides inside a short (as bitfield type matters). It's funny to see that the short bitfield can share the same short as previous char a, as long as they don't overlap. One could have believed that the size of that struct would be 3 or 4 bytes, while it's actually 2 bytes thanks to bitfield merging ability. Same behaviour with m68k-atari-mint and m68k-linux.

2) Zero bit takes more space than one bit

        short a:1;
        short z:1;
        short b:1;

Here size=2, because bitfield merge occurs.

But here:

        short a:1;
        short :0;
        short b:1;

size=4. This is because, as stated at the top of this issue, the strange syntax "short :0" isn't a bitfield definition: it actually forces the alignment of next field. So this prevents field merging between a and b.

vinriviere commented 11 months ago

Summary:

It seems that PCC_BITFIELD_TYPE_MATTERS=1 is now the now the standard way to configure bitfields. This brings more flexibility, as the programmer can explicitly specify the width of bitfield types. Anyway, bitfield implementation isn't well specified by the C standard, so any serious code shouldn't rely on particular layout.

So I'm going to configure my mintelf target with PCC_BITFIELD_TYPE_MATTERS=1, like most other contemporary targets. With the new mintelf target, I wanted to make a fresh start with modern GCC settings. This setting goes into that direction. I will also unset STRUCTURE_SIZE_BOUNDARY, accordingly. Concretely, I won't change the current mint.h as those settings are implicitly used by default.

I understand that those settings goes against @th-otto's wishes, as they differ from m68k-atari-mint traditional behaviour. This isn't a big deal: as this is Free Software, anyone is free to configure his toolchains as he wants, with different settings.

Only annoying thing is the default settings here in the freemint/m68k-atari-mint-gcc repository. I worked on my own on the new PRG/ELF format last summer, then I offered that work to the FreeMiNT organization on 11/08/2023. After that, we had many discussions, and together we greatly improved the whole mintelf toolchain. Fact is I want to use modern struct and bitfield settings for my own toolchain. I understand that this might be a different choice that official FreeMiNT settings. So I'm considering forking the gcc/binutils repositories to my own private space to continue experimental work on the mintelf toolchain, without fearing of causing trouble to @th-otto, whole FreeMiNT ecosystem, or whatever else.

mikrosk commented 11 months ago

Just to expand on the last paragraph: freemint/m68k-atari-mint-gcc have always been based on your work. When you passed me the torch of taking care of the repository back in 2018, my goal always has been the same since then: minimalistic patches, safe changes and adding features only essential to compiler's usage on our platform.

Thorsten on the other hand liked to experiment with more recent versions, backporting a.out to them, building stuff for mingw and other environments so naturally his patchset was much bigger. But that caused no issues, we did exactly what was right: cherry-picked the minimalistic changes to improve gcc 7.x and existing side by side. Case in point: while I kept building ScummVM with m68k-atari-mint-gcc 7.5 from freemint, KeithS was building his (official) ScummVM with Thorsten's m68k-atari-mint-gcc 9.x.

And with this new mintelf target, I don't plan to change anything on this approach. You are of course free to create as many forks as you like but in the end, this repository will follow your development with keeping an eye on Thorsten's latest cool stuff worth cherry-picking. If Thorsten decides to slightly change a parameter here or there doesn't really matter, there's no expectation to be able to mix binaries from different toolchains (as it has never been).

th-otto commented 11 months ago

Yes, it totally agree. But for that to work, our compilers must use the same API. The slightest difference would require different set of libraries, and i don't think that vincent will provide libraries like pnglib, zlib, tiff etc.

vinriviere commented 11 months ago

I'm considering forking the gcc/binutils repositories to my own private space

Done: https://github.com/vinriviere/m68k-atari-mint-gcc https://github.com/vinriviere/m68k-atari-mint-binutils-gdb

I will build my next binaries from there. So I can continue experiments, without fearing to break someone else's work. Anyway, I don't plan to do much more work on gcc, as it now works as expected.

Of course, this doesn't prevent us to continue discussions here at freemint/m68k-atari-mint-gcc/issues, when relevant.

i don't think that vincent will provide libraries like pnglib, zlib, tiff etc.

I do provide some libraries with my cross-tools. Actually, whenever I provide some mint-software I add the depending libraries to the cross-tools packages. For example, I provide openssl with the cross-tools to allow building openssh. Also zlib to build zip/unzip, etc. Main idea is that when people want to rebuild software, they just have to install the cross-tools and build, without caring of libraries. Of course this can only work blindly when the required libraries are already provided. As a counter-example, I don't provide png/tiff libraries as I never provided programs requiring them.

vinriviere commented 10 months ago

Summary: Most modern targets use PCC_BITFIELD_TYPE_MATTERS=1, and I didn't find any good reason to do otherwise. So I kept that default in my own GCC fork: https://github.com/vinriviere/m68k-atari-mint-gcc

I propose to close this issue. We can always reopen it if new information appears.