Open flibitijibibo opened 4 months ago
Hello, I would like to present for this bounty as part-time.
Introduction
Back in 2020/2021 I was trying to create C# bindings for a specific C library called sokol
. I had the same problem outlined here that changes upstream resulted in a lot of manual work to fix/update the C# bindings. Thus, I experimented to automate the process by using libclang
to parse the C header and generate the C# bindings. See this sokol
GItHub issue for my original approach.
My findings have been put into the following projects as tools with a lot of learnings and documentation:
I have tried using the tooling on various C libraries with early limited testing success including: SDL
, FAudio
,FNA3D
, Theorafile
, imgui
, sokol
, libuv
, flecs
, etc.
You can find an example of the generated bindings from https://github.com/bottlenoselabs/SDL-cs/blob/main/src/cs/production/SDL/Generated/SDL.gen.cs. This file gets updated by using Github Actions pipelines via Dependabot to open a new PR when upstream has been updated.
Problems
As others have described in the linked SDL GitHub issue, using libclang
has some issues. Most notably C function-like macros. I have documented some general rules for the C code to make it suitable for bindgen to C# and other languages: https://github.com/bottlenoselabs/CAstFfi?tab=readme-ov-file#limitations-is-the-c-library-ffi-ready.
The best outcome is that changes are made upstream to SDL to accommodate making bindgen friendly. Worse case is that some workarounds would be done for the purposes of generating C# bindings which would be unique to to FNA.
The good news is that SDL2#
, and thus FNA, is not using the full feature set of SDL. For the purposes of this bounty, the subset of the larger problem could be solved by focusing on generating the C# bindings that are only used by FNA. The larger problem could be achieved as a seperate milestone after additional feedback.
Questions
Proposed Deadline
I would be working on this part-time.
Two Weeks of March 18th to March 31st
lithiumtoast
to demonstrate the tooling as a test run replacement for SDL2#
.CAstFfi
project (renamed c2ffi
) with tests to solve some bugs as described here: https://github.com/bottlenoselabs/c2cs/issues/160.Week of April 1st to April 7th
This bit from the README should be able to answer both:
All source code written is yours to keep. This is NOT a work for hire; you will retain full copyright ownership of the code you write. All I am asking is that we get the right to use/publish that code under the license used by the project. For example, if a project is released under zlib, we get to use that code under zlib until we decide to change the license, at which point we will ask you for permission to do so first. If you decide that code you wrote is useful for some proprietary project and it makes you a zillion dollars, fine by me!
The rest sounds okay, so will mark as In Progress.
I'm interested in taking on this.
I've recently been experimenting with generating SDL3 bindings from header files. I've used a combination of ClangSharpPInvokeGenerator
, C# Source Generators and manually written .cs
and ClangSharp configuration files.
char *
functions, using string
/ ReadOnlySpan<byte>
(for UTF8 literals).cs
files are required for macro functions, and typedefs
I have some questions about the C# side of things:
#if NET6_0_OR_GREATER
etc. sprinkled troughout. I'd like to avoid that if possible.in
, out
, ref
params instead of raw pointers)?
char *
functions are really important to have friendly definitions to prevent memory leaksI'm also happy to help with creating demo SDL3 applications to check that the generated bindings are simple to use and correct. Possibly even cross-check the bindings with header files.
I'm interested in taking on this.
I've recently been experimenting with generating SDL3 bindings from header files. I've used a combination of
ClangSharpPInvokeGenerator
, C# Source Generators and manually written.cs
and ClangSharp configuration files.* ClangSharp is really good at creating "bit-perfect" bindings, including things like unmanaged function pointers * Sourcegen is used to automatically generate friendly overloads for `char *` functions, using `string` / `ReadOnlySpan<byte>` (for UTF8 literals) * Manually written `.cs` files are required for macro functions, and typedefs * typedefs could realistically be automatically generated with a simple python script or similar
I have some questions about the C# side of things:
* What are your requirements for C# and .NET versions? * The SDL2# project is structured in a weird way and has `#if NET6_0_OR_GREATER` etc. sprinkled troughout. I'd like to avoid that if possible. * Are you considering releasing the bindings as a nuget package? That way projects other than FNA could use them. * How important are friendly C# function definitions (eg. `in`, `out`, `ref` params instead of raw pointers)? * I presume `char *` functions are really important to have friendly definitions to prevent memory leaks
I'm also happy to help with creating demo SDL3 applications to check that the generated bindings are simple to use and correct. Possibly even cross-check the bindings with header files.
Note that @lithiumtoast may have already started, so definitely coordinate with them if you both want to work on this at the same time.
To answer the questions (all of which are good!):
public
like SDL2) - we like the style of SDL2# but basically nobody else does, so the hope is that we can have our bindings our way and the C# community can use the same definitions to make something more to their liking on NuGet.out
, otherwise we use ref
byte[]
and byte*
, with the UTF8 and memory helpers at the top. .NET 4 unfortunately still needs this but maybe the modern .NET config can use Utf8Marshaler.SDL3.cs will need to compile on VS2010, so .NET 4.0 is the minimum spec - however since it's generated we can go harder with the newer .NET features if it makes stuff run faster.
I'm not sure I understand what you mean by this. Even if the code is generated, VS2010 will still have to compile it.
Unless you mean that there'll be two versions of each function, one for .NET 4.0 compatibility, and one using modern syntax and features. And preprocessor directives could be used to compile only the function relevant for the tooling used.
Hey @Susko3, if you want to share this bounty that's OK with me. It would be great to get some help.
ClangSharp: I originally was using ClangSharp but ended up out growing the project. I wanted to add additional things, make the user experience better, etc. I found out later after talking to some other folks in the space that I really only wanted to target cross-platform C libraries. I did not care for generating bindings for Windows APIs or C++ libraries which is a major use case for ClangSharp internally at Microsoft. I decided to roll my own so I could make changes as I wanted and not have to rely on getting PRs in upstream or forking ClangSharp. This is a common thread for Silk.NET and others that leverage libclang
for cross-platform APIs.
SourceGenerators: I have tried source generators and found them useful in only specific contexts where you have some C# code you want to write inside your end application project and want to generate code from that code. I did not find source generators useful for creating C# bindings when the C API is fixed via source control. Furthermore I discovered that using libclang
on Windows to have different results in specific situations than using libclang
on Linux or macOS because how compilers are implemented and how macro objects are used for conditional compiling. This makes source generators not really the right tool for the job because the information used as input to source generators is not uniquely bound to the same machine. Instead, to capture all the information, the bindgen needs to run all different development environments (Windows, macOS, Linux, etc), then merge the results into an agnostic machine readable file. Then, generate C# code from that machine readable file.
Posting here with an update.
I re-wrote and renamed CAstFfi to https://github.com/bottlenoselabs/c2ffi. I did this while setting up end-to-end tests for GitHub Actions making sure everything works correctly (so far) for all desktop platforms (Windows, macOS, Ubuntu) when using libclang
. What's changed from CAstFfi
is simpler logic, various bug fixes, and an updated data model for the machine readable FFI which is more canon to libclang
.
My plan is to update https://github.com/bottlenoselabs/c2cs
to use c2ffi
this week. Then finish creating that example repository for bindgen of SDL. I expect to have that repository ready for first round of feedback on April 9th.
Hello, I have things ready for some feedback. Apologies about being the timeline being a bit later then what I mentioned earlier.
You can find the generated C# code here from the latest SDL main
commit: https://github.com/lithiumtoast/SDL-cs/blob/main/src/cs/production/SDL/Generated/SDL.gen.cs
The GitHub Actions workflow run is here to which you can find the machine readable files by the artifacts: https://github.com/lithiumtoast/SDL-cs/actions/runs/8680613309
If there any questions about how it works I would be happy to explain.
At this time it doesn't include SDL_image
and friends. However, it should be sufficient to have some discussion and and ask questions.
I'll get us started:
Is there a specific branch of the SDL repository to use for bindgen? I'm assuming main
is fine for SDL3.
I see that SDL2# uses tabs. The currently generated C# code is using spaces. Do you want the generated code to be using tabs?
The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?
Some header files from SDL have been ignored for the purposes of the bindgen. You can see which ones here: https://github.com/lithiumtoast/SDL-cs/blob/94ca7d6d863aad5169d0bac86afc1cf432728a69/bindgen/config-extract.json#L12. Is this assumption okay? Should the generated C# code cover all of SDL? Or is there only a subset that should be covered for the purposes of C#?
On that note it's unfortunate that SDL_stdinc.h
has so many functions that are not useful in C# but SDL_Free
is in in that header. You can see in the extract config mentioned above that a lot of functions and macro objects are ignored from SDL_stdinc.h
. Perhaps the case could be made to move the memory related stuff to its own header? Is it possible to make changes to the SDL repository for the purposes of bindgen?
Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:
C# function pointers for callbacks from C. In the .NET 4 generated version, C# delegates will have to be used for callbacks. This already makes the two versions of the C# bindings significantly different. Is that okay?
Disabled runtime marshalling. For example, System.Boolean
is properly mapped to bool
in C with runtime marshalling disabled (see CBool
in the C2CS.Runtime
for when it is enabled by default). The only "undesired" consequence is that ref
and out
is not supported to which unsafe
pointers will need to be used in C#. Is this taking it to far? I suppose if it is disabled that the P/Invoke functions in C# could have some generated "wrappers" as a compromise. Should runtime marshalling be disabled in the .NET 8+ version?
UTF8 string literals. I have generally been handling char *
as a CString
in C2CS.Runtime
. I see that in SDL2# like you mentioned, the approach was to just have wrapper methods to use System.String
in C#. I think it would be better if the .NET 8+ version of the bindgen got to using UTF8 string literals where ever possible.
- The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?
Aside from the "nuget or not" question, it would be ideal from a security/supply-chain perspective if the NuGet is one that's built and distributed by the SDL organization and not an unknown third party.
- Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:
If you're not aware, you can use the NET8_0_OR_GREATER
preprocessor define in C# if you want to generate a single file that supports both runtimes. I don't know if that will be the ideal approach, though.
Will definitely do a deep dive tomorrow, but to answer what I can ASAP...
Is there a specific branch of the SDL repository to use for bindgen? I'm assuming
main
is fine for SDL3.
Yep, we'll be using the main branch until SDL4 starts.
I see that SDL2# uses tabs. The currently generated C# code is using spaces. Do you want the generated code to be using tabs? It'd be nice to have, but since it's autogenerated anyway it's not too big a deal what the exact style is.
The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?
I'd say remove them if possible; we were able to do all our bindings without dependencies up until now so it should be possible to keep that going.
Some header files from SDL have been ignored for the purposes of the bindgen. You can see which ones here: https://github.com/lithiumtoast/SDL-cs/blob/94ca7d6d863aad5169d0bac86afc1cf432728a69/bindgen/config-extract.json#L12. Is this assumption okay? Should the generated C# code cover all of SDL? Or is there only a subset that should be covered for the purposes of C#? Audio and IOStream might get used (it's rare but some real use cases have come up before)
On that note it's unfortunate that
SDL_stdinc.h
has so many functions that are not useful in C# butSDL_Free
is in in that header. You can see in the extract config mentioned above that a lot of functions and macro objects are ignored fromSDL_stdinc.h
. Perhaps the case could be made to move the memory related stuff to its own header? Is it possible to make changes to the SDL repository for the purposes of bindgen?
Sam/Ryan might consider something like SDL_math.h, stdint, etc, but worst case we can ignore this header and generate the allocator functions ourselves.
Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:
This is fine with me - as long as the code using the functions isn't any different it's okay for the generated stuff to do more optimal things internally (we have a handful of these in SDL2.cs). If it's the case that it would break ABI to have two different versions, the ABI should prefer the old way over the new way (someone making their own bindings with this generator could optimize for themselves if they wanted to though!).
Please take a look at the generated code mentioned at the beginning. Is there anything else not mentioned that would need to be changed?
Will take a look soon!
If you're not aware, you can use the NET8_0_OR_GREATER preprocessor define in C# if you want to generate a single file that supports both runtimes. I don't know if that will be the ideal approach, though.
Yup, I am doing that in a supporting assembly attributes file to the main generated C# file AssemblyAttributes.gen.cs
.
Speaking of which. The usage of partial
on the static class makes makes it possible for the generated C# code to be split across multiple files if that makes thing easier. For example, it would be possible to have one generated C# file to match each C header. Or, to have on C# file for each struct
, enum
, etc.
worst case we can ignore this header and generate the allocator functions ourselves
This could be a good example of using another C# file with partial
to write supporting "extra" functions that should be rolled into the static class
.
Update: April 29th 2024
https://github.com/lithiumtoast/SDL-cs
bottlenoselabs.C2CS.Runtime
to simply Bindgen.Runtime
though so there is no branding.Remaining work:
../Generated/net40/*.cs
vs ../Generated/net80/*.cs
.Update: May 8th 2024
Apologies for the delay as I caught COVID and was not functioning most of the time.
https://github.com/lithiumtoast/SDL-cs
Span<T>
is pretty grim in .NET Framework 4.0. Polyfill for Span<T>
by using the NuGet package System.Memory
is not even an option since it's only supported in .NET Framework 4.5+. Have to fallback to using arrays with a method/property for the edge case of fixed buffers fields inside structs. However, all structs are still 100% blittable.Remaining work:
c2ffi
. For example, SDL_Scancode
is not being generated correctly and is currently empty in the generated C# files.ref
and out
for function parameters.SDL_image
and friends.A bit late to the party but here's my attempt repo, which uses Python with tree sitter to parse the sources and generate the bindings. Currently it needs libsdl-org/SDL#9907 to work properly or else in/out parameters cannot be distinguished properly.
I tried to stay as close as I could to the design of SDL2# and use old C# features as well, but I don't have an old compiler to find the minimum C# version needed.
Feel free to ask about any questions/concerns related to the code.
Update: Here's some sample code that shows how to use the generated API: gist
Introductory Information:
SDL3 is the upcoming major release of SDL that breaks the ABI for the first time in over 10 years, in favor of dramatically improving and redesigning core APIs. It is currently being used by Source 2 internally, but will likely be adopted very quickly by the countless projects currently using SDL2, including FNA.
SDL2# is the C# binding for SDL2, originally built for FNA shortly before SDL 2.0 was officially tagged. The binding has been handwritten and hand-maintained for the entirety of its lifespan. This worked well for most of that time, but in recent years it's gotten increasingly difficult to keep up with upstream SDL changes.
The Project:
Since many different languages want to have SDL bindings, there is now a proposal for SDL3 to maintain machine-readable API files that can easily generate bindings for numerous languages:
https://github.com/libsdl-org/SDL/issues/6337
This bounty specifically targets C#; the goal is to make machine readable definitions and then make a program that generates SDL3.cs, which should look and feel exactly like SDL2#.
The machine-readable files will be hosted in the SDL repo; ideally we'll be able to use the SDL headers directly, but if we have to generate something similar to dynapi that should be fine too.
The C# generator script and the generated bindings will be hosted in the FNA repository directly, with SDL itself being the new submodule rather than a new SDL3-CS repo.
Prerequisites:
Experience with scripting and working with both C headers and machine-readable data formats will help a lot; the language isn't locked down but existing scripts for SDL use Python and Perl. Knowing C# will probably help but is not as important as the other prerequisites.
Example Games:
XNA/FNA games will be making use of this by generating SDL3 bindings that look and act similar to SDL2#, and will be the basis for the SDL3_FNAPlatform, which will live side-by-side with SDL2_FNAPlatform for maximum compatibility with the existing catalog of games.
How Much Can flibit Help?
I'm not great with automated bindings generation, but having worked on SDL2# for a decade I should be able to explain any possible quirks that C# (for example) might have that will affect the API definitions. There are others working on other languages that will probably want to test and give feedback as well.
Budget/Timeline
If done as a part-time gig, I would expect this to take about a month, with the majority of the work being at the first and last steps (the first being getting something generated at all, and the last being upstreaming it for SDL 3.0). There is $4000 USD allocated for this project.