Define `_BitInt` ABI - Githubissues

jsm28 commented 1 year ago

C23 (most recent public draft: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf ) defines "bit-precise integer types": types _BitInt(N) and unsigned _BitInt(N) with a given width. The Arm ABI (both AAPCS32 and AAPCS64) needs to define the ABI for these types; see the x86_64 ABI https://gitlab.com/x86-psABIs/x86-64-ABI for an example.

This means specifying the size, alignment and representation for objects of those types (including whether there are any requirements on the values of padding bits in the in-memory representation), and the interface for argument passing and return (including, again, any requirements on padding bits - both padding bits up to the size of an object of that type, and any further padding beyond that within the size of a register or stack slot used for argument passing or return).

jakubjelinek commented 1 year ago

Do you want to use 64-bit limbs even on 32-bit ARM? I guess that could be a challenge e.g. on the libgcc side, e.g. my https://gcc.gnu.org/PR102989 WIP patch uses umul_ppmm for limbbits * limbbits -> 2xlimbbits multiplication, which 32-bit arm in longlong.h only defines for W_TYPE_SIZE == 32. And, for big endian, will you use in memory big-endian ordering of the limbs (first limb the most significant) or little endian (first limb the least significant, but bits within the limb big-endian)?

mmalcomson commented 1 year ago

@jakubjelinek As it stands yes, we were thinking of using 64-bit limbs for 32-bit ARM (and 128-bit limbs for 64-bit). The idea being that this way _BitInt(64) would match int64_t on AArch32 and _BitInt(128) would match __int128 on AArch64.

W.r.t. the memory ordering we're currently suggesting the use of big-endian memory ordering of the limbs for big-endian systems, but AFAIR we don't have a strong rationale for this decision, so if you have any feedback here that would be welcome.

FWIW the current proposal (along with the rationale document to explain our decisions) can be seen here https://github.com/ARM-software/abi-aa/pull/191.

jakubjelinek commented 1 year ago

Can 64-bit ARM do 128-bit x 128-bit -> 256-bit multiplications or 256-bit / 128-bit -> 128-bit divisions or what is the rationale for such large limbs? Of course, under the hood there could be ABI limb and optimization limb which would be smaller than the ABI one, but then what is the advantage, just making the types larger?

jakubjelinek commented 1 year ago

If the reason is to have _BitInt(128) argument passing/return value compatible with __int128, then you can just say so in the list of exceptions for the smaller sizes, it doesn't need to imply the size of the limb for even larger sizes. Shall _BitInt(257) have 3sizeof(__int128) size or just 5sizeof(long long)? And alignment can be yet another thing that could be independent from that.

mmalcomson commented 1 year ago

Yes, the reasoning so far was focused on ensuring the _BitInt(128) passing and return values were compatible with __int128. The alignment and limb size had been suggested in order to make the description simple rather than to satisfy any fundamental need.

Having a smaller limb size while still maintaining that property seems reasonable on first blush (i.e. without having put much thought into it). Will look into it (probably asking you a few questions along the way) and update the PR with a rationale addressing this (whether for or against).

To ensure I've understand your point correctly: Is it right to say that the main positive you would see from using smaller limbs is that the implementation of multiplication for large sizes would be performing intermediate multiplications on limbsize chunks rather than 1/2 limbsize chunks, and hence the implementation would be simpler. (Edit: I guess the simplification is around having all architectures do logically "the same thing" of working with a limbsize -- is that right?)

mmalcomson commented 1 year ago

@jakubjelinek my current thoughts are not to change, based on the following reasoning:

We really want _BitInt(128) to match __int128 as we think it would be a footgun otherwise (especially since the "quad word" is an ABI-level data type and mapping these C level types to different fundamental data types seems like it could cause problems).
If _BitInt(128) alignment is 16bytes (to match __int128), I think having _BitInt(N>128) have lesser alignment would cause confusion to programmers.
In order to ensure sizeof(Array)/sizeof(ElementType) gives the number of elements in the array, we need no padding between elements, so the size of the elements needs to be divisible by their alignment.
The combination of those points implying that we want the size of elements to always be divisible by 128.

Does this seem reasonable to you?

jakubjelinek commented 1 year ago

@jakubjelinek my current thoughts are not to change, based on the following reasoning:

* We really want `_BitInt(128)` to match `__int128` as we think it would be a footgun otherwise (especially since the "quad word" is an ABI-level data type and mapping these C level types to different fundamental data types seems like it could cause problems).

The programmers will need to be prepared for that already, x86-64 psABI behaves like that. While sizeof (int128) == sizeof (_BitInt(128)), alignof (__int128) > alignof (_BitInt(128))). That also means that int128 and _BitInt(128) are passed there the same if it is passed in registers, but not necessarily when it is passed on the stack, int128 foo (int, int, int, int, int, int, int, int128 x) { return x; } _BitInt(128) bar (int, int, int, int, int, int, int, _BitInt(128) x) { return x; } results in different code (at least in GCC, in clang apparently foo is compiled like bar, which means I think that clang doesn't follow the psABI there - https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/low-level-sys-info.tex#L601 ).

* If `_BitInt(128)` alignment is 16bytes (to match `__int128`), I think having `_BitInt(N>128)` have lesser alignment would cause confusion to programmers.

Why? _BitInt(128) and _BitInt(129) types are distinct types, user shouldn't make assumptions on the alignments or sizes of those types unless he/she knows the corresponding ABI.

Anyway, regarding GCC implementation, as long as _BitInt uses the same endian ordering of limbs as bits inside of those limbs, we could have two separate limb modes, one used for the alignment and sizing and another used for the actual implementation of arithmetics on the type, perhaps including libgcc implementation of the multiplication/division. If the endianity is different, those two limb modes would need to be the same obviously.

nsz-arm commented 1 year ago

* If `_BitInt(128)` alignment is 16bytes (to match `__int128`), I think having `_BitInt(N>128)` have lesser alignment would cause confusion to programmers.
Why? _BitInt(128) and _BitInt(129) types are distinct types, user shouldn't make assumptions on the alignments or sizes of those types unless he/she knows the corresponding ABI.

there were several cases when a small change to linux uapi structs (e.g using bits from a previously reserved field) broke the abi because the alignment requirement changed unexpectedly. so weird alignment requirement can definitely cause problems. but i don't know if that's worse or mismatching _BitInt(128) and __int128_t alignment.

nsz-arm commented 1 year ago

it seems released versions of clang already implement _BitInt up to N=128 and it has 8byte alignment on both aarch64 and arm. so it is probably better to document the existing practice instead of doing something different that breaks ABI when -std=c23 is used. https://godbolt.org/z/4aTq5Ezeq

mmalcomson commented 1 year ago

@jakubjelinek Just FYI I've recently pushed an update to the bitint rationale document in the relevant PR (https://github.com/ARM-software/abi-aa/pull/191 )

The point you raised about x86-64 doing something different and hence programmers will have to be ready for _BitInt(128) != __int128 is a good one, so I added that to the rationale and adjusted the rationale to mention that the decision is close. That said, we're still leaning towards 128bit alignment for _BitInt(128), as it does seem to fit better with the rest of our ABI, and allow single-copy atomicity for LSE2 LDP and STP.

N.b. for completeness of documentation in the ticket -- after Szabolcs mentioned that clang has released a compiler using 8byte alignment for such types I double-checked with the LLVM folk that their AArch64 _BitInt ABI is explicitly called out as unstable, so that shouldn't be a problem.

ARM-software / abi-aa

Define `_BitInt` ABI #175