bottlenoselabs / c2cs

Generate C# bindings from a C header.
MIT License
250 stars 18 forks source link

Anonymous fields not used #160

Open Larkooo opened 11 months ago

Larkooo commented 11 months ago

the anonymous field dont seem to be used when generating a union struct

lithiumtoast commented 11 months ago

Hey @Larkooo, could you provide an example?

Larkooo commented 11 months ago

Yes, our union C structs look like

typedef struct {
 enum tag;
 union {
  struct struct_;
  enum enum_;
  ...
 }
} ty;

first of all, it seems like the field offsets are wrong, they are all 0 when for eg. struct and enum should be of fieldoffset(8)

and the generated fields dont use the anonymous fields - which seems to be the issue for why the offsets are wrong i believe. (the generated metadata from castffi seems to be ok tho)

Larkooo commented 11 months ago

dojo.h if you were wondering, this is the header file we're having an issue with. i have a scoped fork of c2cs so i fixed it for our use case but best would be to look into the anonymous fields issue

lithiumtoast commented 11 months ago

When looking into it, with the new version of castffi you can do the following for your extract config so there is only one file:

{
    "inputFilePath": "./dojo.h",
    "targetPlatforms": {
        "windows": {
            "x86_64-pc-windows-msvc": {},
            "aarch64-pc-windows-msvc": {}
        },
        "macos": {
            "aarch64-apple-darwin": {},
            "x86_64-apple-darwin": {}
        },
        "linux": {
            "x86_64-unknown-linux-gnu": {},
            "aarch64-unknown-linux-gnu": {}
        }
    }
}

Other than that I'm looking at Primitive an example and something indeed seems to be wrong.

C# code:

        [CNode(Kind = "Struct")]
        [StructLayout(LayoutKind.Explicit, Size = 40, Pack = 8)]
        public struct Primitive
        {
            [FieldOffset(0)] // size = 4
            public Primitive_Tag tag;

            [FieldOffset(0)] // size = 1
            public byte u8;

            [FieldOffset(0)] // size = 2
            public ushort u16;

            [FieldOffset(0)] // size = 4
            public uint u32;

            [FieldOffset(0)] // size = 8
            public ulong u64;

            [FieldOffset(0)] // size = 16
            public fixed byte _u128[16]; // uint8_t[16]

            public readonly Span<byte> u128
            {
                get
                {
                    fixed (Primitive* @this = &this)
                    {
                        var pointer = &@this->_u128[0];
                        var span = new Span<byte>(pointer, 16);
                        return span;
                    }
                }
            }

            [FieldOffset(0)] // size = 32
            public fixed byte _u256[32]; // uint64_t[4]

            public readonly Span<ulong> u256
            {
                get
                {
                    fixed (Primitive* @this = &this)
                    {
                        var pointer = &@this->_u256[0];
                        var span = new Span<ulong>(pointer, 4);
                        return span;
                    }
                }
            }

            [FieldOffset(0)] // size = 4
            public uint u_size;

            [FieldOffset(0)] // size = 1
            public CBool p_bool;

            [FieldOffset(0)] // size = 32
            public FieldElement felt252;

            [FieldOffset(0)] // size = 32
            public FieldElement class_hash;

            [FieldOffset(0)] // size = 32
            public FieldElement contract_address;
        }

C code:

typedef struct Primitive {
  Primitive_Tag tag;
  struct {
    union {
      struct {
        uint8_t u8;
      };
      struct {
        uint16_t u16;
      };
      struct {
        uint32_t u32;
      };
      struct {
        uint64_t u64;
      };
      struct {
        uint8_t u128[16];
      };
      struct {
        uint64_t u256[4];
      };
      struct {
        uint32_t u_size;
      };
      struct {
        bool p_bool;
      };
      struct {
        struct FieldElement felt252;
      };
      struct {
        struct FieldElement class_hash;
      };
      struct {
        struct FieldElement contract_address;
      };
    };
  };
} Primitive;

I would expect it to be like:

            [FieldOffset(0)] // size = 4
            public Primitive_Tag tag;

            [FieldOffset(4)] // size = 1
            public byte u8;

            [FieldOffset(4)] // size = 2
            public ushort u16;

            [FieldOffset(4)] // size = 4
            public uint u32;

            [FieldOffset(4)] // size = 8
            public ulong u64;
lithiumtoast commented 11 months ago

I think there are improvements to be made for both castffi and c2cs in this case. I never experienced such usage of union and struct in C before. I can add some tests to capture such usage.

Larkooo commented 11 months ago

Yes, as I said I have fixed that issue in my fork of c2cs so I would be open to contributing to castffi/c2cs if I know where to look at and if there are written tests for those use cases, as it is quite difficult to get through the C parsing part of the code. As for the FieldOffset(s) I think that it should be 8 as the struct packing is set to 8 bytes.

lithiumtoast commented 11 months ago

I think that it should be 8 as the struct packing is set to 8 bytes.

Yeah I suppose so.

If you put up a MR that be great.

The code was basically manually tested against versus C libraries that I come across. Your C code is interesting because I have seen anonymous unions being used as such but never have I seen someone use anonymous structs like that. I have added tests but they don't cover anonymous structs and anonymous unions like this yet.

c2cs should be pretty easy to deal with. castffi is probably the hardest part. Both have identical test setup by using some C code for tests.

The related C code that "handles" a struct would be here for castffi: https://github.com/bottlenoselabs/CAstFfi/blob/main/src/cs/production/CAstFfi.Tool/Features/Extract/Domain/Explore/Handlers/StructExplorer.cs#L31. There is a similar one for unions: https://github.com/bottlenoselabs/CAstFfi/blob/main/src/cs/production/CAstFfi.Tool/Features/Extract/Domain/Explore/Handlers/UnionExplorer.cs#L31.

What the harder part is determining how to identify a cursor so it can be "handled". Basically how it works for castffi is that using libclang, "cursors" in the abstract syntax tree can be "visited" recursively. Once visited they are enqueued into a double ended queue data structure for processing at a later stage (separated into 3 parts as variables, functions, and types) as "info nodes". The names are cached so they are not added again in case a cursor is seen multiple times. Once every "top level" cursor is visited then they are "explored" (i.e. processed). Note that as cursors are processed their "descendants" are "visited" leading to a recursive pattern until there are not more cursors to "visit" or "explore".

Anonymous fields have a type and that anonymous type is given a name. The code is here https://github.com/bottlenoselabs/CAstFfi/blob/main/src/cs/production/CAstFfi.Tool/Features/Extract/Domain/Explore/ExploreContext.cs#L73 then here https://github.com/bottlenoselabs/CAstFfi/blob/main/src/cs/production/CAstFfi.Tool/Features/Extract/Domain/Explore/ExploreContext.cs#L196. This happens so the name can be later looked up for the extracted data.

Expected test data would be added to https://github.com/bottlenoselabs/CAstFfi/tree/main/src/cs/tests/CAstFfi.Tests/Data/Values with a new folder for Structs and Unions per target platform. Changing the regenerateFiles to true in the constructor here https://github.com/bottlenoselabs/CAstFfi/blob/main/src/cs/tests/CAstFfi.Tests/TestCCode.cs#L100 can be used to generate the expected test data. There already exists a method GetRecord and TryGetRecord which can be used for tests: https://github.com/bottlenoselabs/CAstFfi/blob/main/src/cs/tests/CAstFfi.Tests/TestCCodeAbstractSyntaxTree.cs#L134. Though perhaps it should be renamed for Struct and Union.