flibitijibibo / flibitBounties

Pile of programming bounties for things flibit can't do right now
27 stars 0 forks source link

Create language binding generation tools for SDL3 #6

Open flibitijibibo opened 4 months ago

flibitijibibo commented 4 months ago

Introductory Information:

SDL3 is the upcoming major release of SDL that breaks the ABI for the first time in over 10 years, in favor of dramatically improving and redesigning core APIs. It is currently being used by Source 2 internally, but will likely be adopted very quickly by the countless projects currently using SDL2, including FNA.

SDL2# is the C# binding for SDL2, originally built for FNA shortly before SDL 2.0 was officially tagged. The binding has been handwritten and hand-maintained for the entirety of its lifespan. This worked well for most of that time, but in recent years it's gotten increasingly difficult to keep up with upstream SDL changes.

The Project:

Since many different languages want to have SDL bindings, there is now a proposal for SDL3 to maintain machine-readable API files that can easily generate bindings for numerous languages:

https://github.com/libsdl-org/SDL/issues/6337

This bounty specifically targets C#; the goal is to make machine readable definitions and then make a program that generates SDL3.cs, which should look and feel exactly like SDL2#.

The machine-readable files will be hosted in the SDL repo; ideally we'll be able to use the SDL headers directly, but if we have to generate something similar to dynapi that should be fine too.

The C# generator script and the generated bindings will be hosted in the FNA repository directly, with SDL itself being the new submodule rather than a new SDL3-CS repo.

Prerequisites:

Experience with scripting and working with both C headers and machine-readable data formats will help a lot; the language isn't locked down but existing scripts for SDL use Python and Perl. Knowing C# will probably help but is not as important as the other prerequisites.

Example Games:

XNA/FNA games will be making use of this by generating SDL3 bindings that look and act similar to SDL2#, and will be the basis for the SDL3_FNAPlatform, which will live side-by-side with SDL2_FNAPlatform for maximum compatibility with the existing catalog of games.

How Much Can flibit Help?

I'm not great with automated bindings generation, but having worked on SDL2# for a decade I should be able to explain any possible quirks that C# (for example) might have that will affect the API definitions. There are others working on other languages that will probably want to test and give feedback as well.

Budget/Timeline

If done as a part-time gig, I would expect this to take about a month, with the majority of the work being at the first and last steps (the first being getting something generated at all, and the last being upstreaming it for SDL 3.0). There is $4000 USD allocated for this project.

lithiumtoast commented 4 months ago

Hello, I would like to present for this bounty as part-time.

Introduction

Back in 2020/2021 I was trying to create C# bindings for a specific C library called sokol. I had the same problem outlined here that changes upstream resulted in a lot of manual work to fix/update the C# bindings. Thus, I experimented to automate the process by using libclang to parse the C header and generate the C# bindings. See this sokol GItHub issue for my original approach.

My findings have been put into the following projects as tools with a lot of learnings and documentation:

I have tried using the tooling on various C libraries with early limited testing success including: SDL, FAudio,FNA3D, Theorafile, imgui, sokol, libuv, flecs, etc.

You can find an example of the generated bindings from https://github.com/bottlenoselabs/SDL-cs/blob/main/src/cs/production/SDL/Generated/SDL.gen.cs. This file gets updated by using Github Actions pipelines via Dependabot to open a new PR when upstream has been updated.

Problems

As others have described in the linked SDL GitHub issue, using libclang has some issues. Most notably C function-like macros. I have documented some general rules for the C code to make it suitable for bindgen to C# and other languages: https://github.com/bottlenoselabs/CAstFfi?tab=readme-ov-file#limitations-is-the-c-library-ffi-ready.

The best outcome is that changes are made upstream to SDL to accommodate making bindgen friendly. Worse case is that some workarounds would be done for the purposes of generating C# bindings which would be unique to to FNA.

The good news is that SDL2#, and thus FNA, is not using the full feature set of SDL. For the purposes of this bounty, the subset of the larger problem could be solved by focusing on generating the C# bindings that are only used by FNA. The larger problem could be achieved as a seperate milestone after additional feedback.

Questions

Proposed Deadline

I would be working on this part-time.

Two Weeks of March 18th to March 31st

Week of April 1st to April 7th

flibitijibibo commented 4 months ago

This bit from the README should be able to answer both:

All source code written is yours to keep. This is NOT a work for hire; you will retain full copyright ownership of the code you write. All I am asking is that we get the right to use/publish that code under the license used by the project. For example, if a project is released under zlib, we get to use that code under zlib until we decide to change the license, at which point we will ask you for permission to do so first. If you decide that code you wrote is useful for some proprietary project and it makes you a zillion dollars, fine by me!

The rest sounds okay, so will mark as In Progress.

Susko3 commented 4 months ago

I'm interested in taking on this.

I've recently been experimenting with generating SDL3 bindings from header files. I've used a combination of ClangSharpPInvokeGenerator, C# Source Generators and manually written .cs and ClangSharp configuration files.

I have some questions about the C# side of things:

I'm also happy to help with creating demo SDL3 applications to check that the generated bindings are simple to use and correct. Possibly even cross-check the bindings with header files.

flibitijibibo commented 4 months ago

I'm interested in taking on this.

I've recently been experimenting with generating SDL3 bindings from header files. I've used a combination of ClangSharpPInvokeGenerator, C# Source Generators and manually written .cs and ClangSharp configuration files.

* ClangSharp is really good at creating "bit-perfect" bindings, including things like unmanaged function pointers

* Sourcegen is used to automatically generate friendly overloads for `char *` functions, using `string` / `ReadOnlySpan<byte>` (for UTF8 literals)

* Manually written `.cs` files are required for macro functions, and typedefs

  * typedefs could realistically be automatically generated with a simple python script or similar

I have some questions about the C# side of things:

* What are your requirements for C# and .NET versions?

  * The SDL2# project is structured in a weird way and has `#if NET6_0_OR_GREATER` etc. sprinkled troughout. I'd like to avoid that if possible.

* Are you considering releasing the bindings as a nuget package? That way projects other than FNA could use them.

* How important are friendly C# function definitions (eg. `in`, `out`, `ref` params instead of raw pointers)?

  * I presume `char *` functions are really important to have friendly definitions to prevent memory leaks

I'm also happy to help with creating demo SDL3 applications to check that the generated bindings are simple to use and correct. Possibly even cross-check the bindings with header files.

Note that @lithiumtoast may have already started, so definitely coordinate with them if you both want to work on this at the same time.

To answer the questions (all of which are good!):

Susko3 commented 4 months ago

SDL3.cs will need to compile on VS2010, so .NET 4.0 is the minimum spec - however since it's generated we can go harder with the newer .NET features if it makes stuff run faster.

I'm not sure I understand what you mean by this. Even if the code is generated, VS2010 will still have to compile it.

Unless you mean that there'll be two versions of each function, one for .NET 4.0 compatibility, and one using modern syntax and features. And preprocessor directives could be used to compile only the function relevant for the tooling used.

lithiumtoast commented 4 months ago

Hey @Susko3, if you want to share this bounty that's OK with me. It would be great to get some help.

lithiumtoast commented 3 months ago

Posting here with an update.

I re-wrote and renamed CAstFfi to https://github.com/bottlenoselabs/c2ffi. I did this while setting up end-to-end tests for GitHub Actions making sure everything works correctly (so far) for all desktop platforms (Windows, macOS, Ubuntu) when using libclang. What's changed from CAstFfi is simpler logic, various bug fixes, and an updated data model for the machine readable FFI which is more canon to libclang.

My plan is to update https://github.com/bottlenoselabs/c2cs to use c2ffi this week. Then finish creating that example repository for bindgen of SDL. I expect to have that repository ready for first round of feedback on April 9th.

lithiumtoast commented 3 months ago

Hello, I have things ready for some feedback. Apologies about being the timeline being a bit later then what I mentioned earlier.

You can find the generated C# code here from the latest SDL main commit: https://github.com/lithiumtoast/SDL-cs/blob/main/src/cs/production/SDL/Generated/SDL.gen.cs The GitHub Actions workflow run is here to which you can find the machine readable files by the artifacts: https://github.com/lithiumtoast/SDL-cs/actions/runs/8680613309 If there any questions about how it works I would be happy to explain.

At this time it doesn't include SDL_image and friends. However, it should be sufficient to have some discussion and and ask questions.

I'll get us started:

  1. Is there a specific branch of the SDL repository to use for bindgen? I'm assuming main is fine for SDL3.

  2. I see that SDL2# uses tabs. The currently generated C# code is using spaces. Do you want the generated code to be using tabs?

  3. The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?

  4. Some header files from SDL have been ignored for the purposes of the bindgen. You can see which ones here: https://github.com/lithiumtoast/SDL-cs/blob/94ca7d6d863aad5169d0bac86afc1cf432728a69/bindgen/config-extract.json#L12. Is this assumption okay? Should the generated C# code cover all of SDL? Or is there only a subset that should be covered for the purposes of C#?

  5. On that note it's unfortunate that SDL_stdinc.h has so many functions that are not useful in C# but SDL_Free is in in that header. You can see in the extract config mentioned above that a lot of functions and macro objects are ignored from SDL_stdinc.h. Perhaps the case could be made to move the memory related stuff to its own header? Is it possible to make changes to the SDL repository for the purposes of bindgen?

  6. Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:

    • C# function pointers for callbacks from C. In the .NET 4 generated version, C# delegates will have to be used for callbacks. This already makes the two versions of the C# bindings significantly different. Is that okay?

    • Disabled runtime marshalling. For example, System.Boolean is properly mapped to bool in C with runtime marshalling disabled (see CBool in the C2CS.Runtime for when it is enabled by default). The only "undesired" consequence is that ref and out is not supported to which unsafe pointers will need to be used in C#. Is this taking it to far? I suppose if it is disabled that the P/Invoke functions in C# could have some generated "wrappers" as a compromise. Should runtime marshalling be disabled in the .NET 8+ version?

    • UTF8 string literals. I have generally been handling char * as a CString in C2CS.Runtime. I see that in SDL2# like you mentioned, the approach was to just have wrapper methods to use System.String in C#. I think it would be better if the .NET 8+ version of the bindgen got to using UTF8 string literals where ever possible.

    1. Please take a look at the generated code mentioned at the beginning. Is there anything else not mentioned that would need to be changed?
kg commented 3 months ago
  1. The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?

Aside from the "nuget or not" question, it would be ideal from a security/supply-chain perspective if the NuGet is one that's built and distributed by the SDL organization and not an unknown third party.

  1. Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:

If you're not aware, you can use the NET8_0_OR_GREATER preprocessor define in C# if you want to generate a single file that supports both runtimes. I don't know if that will be the ideal approach, though.

flibitijibibo commented 3 months ago

Will definitely do a deep dive tomorrow, but to answer what I can ASAP...

Is there a specific branch of the SDL repository to use for bindgen? I'm assuming main is fine for SDL3.

Yep, we'll be using the main branch until SDL4 starts.

I see that SDL2# uses tabs. The currently generated C# code is using spaces. Do you want the generated code to be using tabs? It'd be nice to have, but since it's autogenerated anyway it's not too big a deal what the exact style is.

The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?

I'd say remove them if possible; we were able to do all our bindings without dependencies up until now so it should be possible to keep that going.

Some header files from SDL have been ignored for the purposes of the bindgen. You can see which ones here: https://github.com/lithiumtoast/SDL-cs/blob/94ca7d6d863aad5169d0bac86afc1cf432728a69/bindgen/config-extract.json#L12. Is this assumption okay? Should the generated C# code cover all of SDL? Or is there only a subset that should be covered for the purposes of C#? Audio and IOStream might get used (it's rare but some real use cases have come up before)

On that note it's unfortunate that SDL_stdinc.h has so many functions that are not useful in C# but SDL_Free is in in that header. You can see in the extract config mentioned above that a lot of functions and macro objects are ignored from SDL_stdinc.h. Perhaps the case could be made to move the memory related stuff to its own header? Is it possible to make changes to the SDL repository for the purposes of bindgen?

Sam/Ryan might consider something like SDL_math.h, stdint, etc, but worst case we can ignore this header and generate the allocator functions ourselves.

Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:

This is fine with me - as long as the code using the functions isn't any different it's okay for the generated stuff to do more optimal things internally (we have a handful of these in SDL2.cs). If it's the case that it would break ABI to have two different versions, the ABI should prefer the old way over the new way (someone making their own bindings with this generator could optimize for themselves if they wanted to though!).

Please take a look at the generated code mentioned at the beginning. Is there anything else not mentioned that would need to be changed?

Will take a look soon!

lithiumtoast commented 3 months ago

If you're not aware, you can use the NET8_0_OR_GREATER preprocessor define in C# if you want to generate a single file that supports both runtimes. I don't know if that will be the ideal approach, though.

Yup, I am doing that in a supporting assembly attributes file to the main generated C# file AssemblyAttributes.gen.cs.

Speaking of which. The usage of partial on the static class makes makes it possible for the generated C# code to be split across multiple files if that makes thing easier. For example, it would be possible to have one generated C# file to match each C header. Or, to have on C# file for each struct, enum, etc.

worst case we can ignore this header and generate the allocator functions ourselves

This could be a good example of using another C# file with partial to write supporting "extra" functions that should be rolled into the static class.

lithiumtoast commented 2 months ago

Update: April 29th 2024

https://github.com/lithiumtoast/SDL-cs

Remaining work:

lithiumtoast commented 2 months ago

Update: May 8th 2024

Apologies for the delay as I caught COVID and was not functioning most of the time.

https://github.com/lithiumtoast/SDL-cs

Remaining work:

TerensTare commented 1 month ago

A bit late to the party but here's my attempt repo, which uses Python with tree sitter to parse the sources and generate the bindings. Currently it needs libsdl-org/SDL#9907 to work properly or else in/out parameters cannot be distinguished properly.

I tried to stay as close as I could to the design of SDL2# and use old C# features as well, but I don't have an old compiler to find the minimum C# version needed.

Feel free to ask about any questions/concerns related to the code.

Update: Here's some sample code that shows how to use the generated API: gist