dotnet / ClangSharp

Clang bindings for .NET written in C#
MIT License
927 stars 146 forks source link

`uint64_t` mapped to `nuint` on Linux (but `ulong` on Windows) #574

Open qmfrederik opened 2 weeks ago

qmfrederik commented 2 weeks ago

Consider the following file:

#include <stdint.h>

struct rdp_bitmap
{
    uint64_t key64;             /* 26 */
};

The following code is generated on Windows (via clangsharppinvokegenerator -f .\graphics.h -n FreeRDP3 -o graphics.cs):

namespace FreeRDP3
{
    public partial struct rdp_bitmap
    {
        [NativeTypeName("uint64_t")]
        public ulong key64;
    }
}

whereas on Linux, the following code is generated (via ClangSharpPInvokeGenerator -f ./graphics.h -I /usr/lib/clang/17/include -n FreeRDP3 -o graphics.cs):

namespace FreeRDP3
{
    public partial struct rdp_bitmap
    {
        [NativeTypeName("uint64_t")]
        public nuint key64;
    }
}

The Linux mapping to nuint is incorrect, since uint64_t is, by definition, 64-bits wide and architecture-independent.

tannergooding commented 2 weeks ago

The Linux mapping to nuint is incorrect, since uint64_t is, by definition, 64-bits wide and architecture-independent.

Notably it's actually quite a bit more complicated than this. For example, uint64_t isn't guaranteed to exist and thus is not architecture-or implementation independent. Additionally, while it is exactly 64-bits when it does exist, that doesn't guarantee the alignment/packing or other characteristics of the type, which similarly make it not architecture-independent.

I expect the reason the generation is happening this way comes down to the underlying header likely doing (possibly through several intermediate typedefs, as is typical on Linux) typedef unsigned long uint64_t;. Such a definition will break the heuristics that ClangSharp has built in and will need to be manually overridden via --with-remapping or similar.

The reason this breaks the heuristics is because it is "technically incorrect" according to the official ABI specification and breaks certain considerations that may exist for languages like C++ (specifically around overload resolution and the like). As per the System V Application Binary Interface AMD64 Architecture Processor Supplement (With LP64 and ILP32 Programming Models):

Thus, for x86 and x64 computers a typedef "should" be using long long/unsigned long long for any value that is always 64-bits (such as int64_t/uint64_t). Likewise, it should be using int/unsigned int for any type that is always 32-bits (such as int32_t/uint32_t), and long/unsigned long for any type that is ptr-bits (such as size_t/ssize_t/intptr_t/uintptr_t). -- The exact types used for other architectures may vary based on the respective ABIs.

This consideration becomes very important in order to allow overloading and differentiating between int32_t, intptr_t, and int64_t across all computers, to allow features like RTTI or similar to function as expect, to allow name mangling to work, etc.


ClangSharp could potentially add a built-in hook to intercept int64_t and similar and assume they are the stdint types (and not some other user-defined type), but that will still miss other typedefs that have similar problems and may require the user to manually override as well.

qmfrederik commented 2 weeks ago

Thanks @tannergooding for the detailed explenation. On clang on Linux, uint64_t is defined in stdint.h like this:

typedef __INT64_TYPE__ int64_t;

and the __INT64_TYPE__ macro expands to:

long unsigned int

as determined like this:

$ clang -cc1 -dM -E /usr/lib/clang/17/include/stdint.h | grep __UINT64_TYPE__
/usr/lib/clang/17/include/stdint.h:20:24: warning: #include_next in primary source file; will search from start of include path [-Winclude-next-outside-header]
   20 | #if __STDC_HOSTED__ && __has_include_next(<stdint.h>)
      |                        ^
1 warning generated.
#define __UINT64_TYPE__ long unsigned int

So that leaves the option of using --remap "uint64_t=ulong" which I can confirm works.

I would like to ask you to consider adding a default remapping for this type, since I believe ulong is the .NET type which is the closest to uint64_t which is defined as "an unsigned integer type with width 64 and no padding bits"; but adding the remapping will help us, too.