godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.08k stars 69 forks source link

Expose a zero allocation API to Godot C# bindings #7842

Open reduz opened 10 months ago

reduz commented 10 months ago

Describe the project you are working on

Godot C# bindings

Describe the problem or limitation you are having in your project

For the past weeks, I've been discussing with several Unity users intending to move to Godot C# regarding dealing with the C# garbage collector.

The most common complaint I hear from users is that, in Unity, allocations can trigger unexpected GC spikes into the game.

In Godot, we target to make all of the high performance APIs (those that intended to be called every frame) not allocate any memory, so theoretically the GC should not be a problem. Additionally, Godot starting from 4.0, uses the Microsoft CoreCLR version of .net, which also supposedly has a better garbage collector than Unity.

But in all, after several discussions with Unity users, neither is enough reassurance for them, and they would really feel safer if Godot exposed a zero allocation API.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

The idea of this proposal is that Godot exposes zero allocation versions of many functions in the C# API, that users can use if they desire.

Technically, this could be done from the binding generator itself, without breaking compatibility, and without doing any modification to Godot itself.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

WARNING I am not familiar with C#, so take this as pseudocode.

Imagine you have two functions exposed as to C#:

void MyClass.SetArray( Vector2[] array);
Vector2[] MyClass.GetArray();

This works and is pretty and intuitive. However, it has two problems:

The idea is to add NoAlloc versions, which can be generated directly by the binder automatically when required:

void MyClass.SetArrayNoAlloc( Godot.Collections.PackedVector2Array array);
void MyClass.GetArrayNoAlloc(ref Godot.Collections.PackedVector2Array ret_vec2_array);

This way, we solve two problems:

As a result, the Godot API can be used entirely from C# in a way so, if this area is performance critical somehow, we ensure no memory copies, safe native access to Godot internal data and ability to control what goes to the GC.

The idea is to provide these specialized call syntaxes for functions using packed arrays, arrays and dictionaries.

If this enhancement will not be used often, can it be worked around with a few lines of script?

N/A

Is there a reason why this should be core and not an add-on in the asset library?

N/A

xzbobzx commented 10 months ago

I like this part:

"Technically, this could be done from the binding generator itself, without breaking compatibility, and without doing any modification to Godot itself."

This proposal would give users more control over their game's performance while the tradeoff on Godot's part is essentially zero. (Sure, some dev time to get it set up.)

It sounds like a lot of pros for hardly any cons. Am I reading it correctly like that?

reduz commented 10 months ago

@xzbobzx Yeah I think so.

wareya commented 10 months ago

Would this automatic re-binding work for APIs that have variant Arrays in them, like ArrayMesh's add_surface_from_arrays? Or would add_surface_from_arrays continue to take a Godot Array with C# arrays inside it with no alternative? (To note, some of the entries in the Array passed to add_surface_from_arrays are of determined-at-runtime types - the custom-format arrays like ARRAY_CUSTOM0 etc).

reduz commented 10 months ago

@wareya I think the main problem here is more along the lines of the raycast function rather than a GC one. If you want to build and update geometry every frame, Godot has API in C++ for this (as example the one used by softbodies), but its not exposed to the binder due to limitations of it. Solving that issue should solve this one too.

Nomad1 commented 10 months ago

Are there any actual confirmed cases of GC stalls caused by this? Don’t get me wrong, but getting a huge array of vectors from the engine is not a common task, it goes to Gen1 type GC queue and shouldn’t affect performance anyway. Submitting arrays to the engine is an another story but still can be improved easily with pointers - pass a pointer to the array instead of doing stackalloc and copying the contents in manage score but copy it in CPP part instead. My concerns is that Godot.Collections are not so common and flexible as built in CLR types and with it we are moving farther from vanilla .Net to a custom language dialect. P.S. Surely, if you can implement a NoAlloc version in a few clicks, we can give it a try and I might be wrong and it will be even handlier.

wareya commented 10 months ago

@reduz I don't rebuild geometry every frame all the time when standing still with the world fully loaded, but I do while the world is loading and also when the player travels into new chunks. I do it on a separate thread, streaming everything in as fast as possible, so while it doesn't cause a lower framerate or any stuttering, a performance improvement in removing the extra copy would be noticeable to players as being able to turn up the render distance higher and distant lands becoming visible more quickly. Despite being an unusual edge case, it makes the extra copy when converting the C# arrays into Godot packed arrays worth worrying about to me for a reason other than framerate or loading time impact.

reduz commented 10 months ago

@Nomad1 Godot internally does not use C# arrays, so you have to pick. You use the pretty dialect and stand copies and GC issues, or you use the Godot dialect and ensure maximum performance.

I am pretty sure that not all your code will need to use the Godot dialect, so it should be reserved for cases where you absolutely want the performance.

gknedo commented 10 months ago

Nice! I'm following this discussion and this proposal has the potential to touch some hearts.

I think it's out of the scope of this discussion, but would be nice to see the boundaries of this zero allocation API. Having a list of the mapped functions. So the work to be done would be clear.

And I think it's not the moment to think about the details of the implementation, but what do you think of using the same structure of the current API with a different namespace? Something like

void NoAlloc.MyClass.SetArray( Godot.Collections.PackedVector2Array array);
void NoAlloc.MyClass.GetArray(ref Godot.Collections.PackedVector2Array ret_vec2_array);
tkgalk commented 10 months ago

Great stuff, very much in favour, and I really appreciate that you heard the feedback and opened the proposal. Proposal coming from Juan will have a different reach than one made by a random "Unity-refugee".

Obvious next step would be a C# profiler in Editor, but that's a separate thing and I'm sure will be tackled in the future (for now the JetBrains one has to be enough).

Nomad1 commented 10 months ago

@Nomad1 Godot internally does not use C# arrays, so you have to pick. You use the pretty dialect and stand copies and GC issues, or you use the Godot dialect and ensure maximum performance.

C# Vector2[] in memory is stored exactly the same as C++ Vector2*. We can pass it to the native code for free, with zero copying or using Godot.Collections.Array. There won't be any issues unless you take the pointer and hold it longer than needed. And even if you intend to make a hard copy of that data in C++ code, it will be out of GC scope. So I'd say it's generally better than using your own collection types and losing things like Span<>, BlockCopy, natively optimized iteration, etc.

reduz commented 10 months ago

@Nomad1

C# Vector2[] in memory is stored exactly the same as C++ Vector2*. We can pass it to the native code for free

Except for some specific cases where Godot does take pointers (in very high performance APIs such as RenderingDevice::buffer_update as example), in far most cases it uses reference counted arrays because its far easier and safer to pass around information this way, so even if Unity can pass a raw pointer, this is not useful to Godot.

Hence, the way to do these things efficiently in C# in Godot is to use the Godot collections.

fodinabor commented 10 months ago

Is the array always big enough or do you need to provide a hook in the api to communicate its too small, i.e.

bool MyClass.GetArrayNoAlloc(ref Godot.Collections.PackedVector2Array ret_vec2_array, ref int size);
// or if PackedVector2Array knows its size:
int MyClass.GetArrayNoAlloc(ref Godot.Collections.PackedVector2Array ret_vec2_array);
SamPruden commented 10 months ago

This is very cool! I'm glad to see this being embraced!

As the person who started all of this mess for you (sorry) I'd be happy to work on the design and maybe implementation side of this if it would help. I've got quite a few ideas about how to do this style of API nicely.

The basic idea here is very similar to what I had imagined, which is lovely - particularly where it's handled automatically in the binding.

I'll put some more detailed proposals down sometime soon if they're welcome.

Nomad1 commented 10 months ago

@reduz

C# Vector2[] in memory is stored exactly the same as C++ Vector2*. We can pass it to the native code for free

Except for some specific cases where Godot does take pointers (in very high performance APIs such as RenderingDevice::buffer_update as example), in far most cases it uses reference counted arrays because its far easier and safer to pass around information this way, so even if Unity can pass a raw pointer, this is not useful to Godot.

Hence, the way to do these things efficiently in C# in Godot is to use the Godot collections.

Well, my approach was never about adopting the raw pointers internally, but about adding C++ overloads that can make a Godot.Collections.Array from data pointer instead of numerous calls to monsters like godotsharp_packed_vector2_array_new_mem_copy. That's how we can achieve performant interop without extra pressure on GC or filling the code with collections created in C++ and passed back and forth between unmanaged and managed parts. However, I understand that it will be nowhere near your solution by the amount of changes required.

SamPruden commented 10 months ago

I agree with @Nomad1 that this is a good opportunity to get some additional performance wins (in both C# and GDE) by allowing some internal allocations to be optionally skipped in favour of external preallocated buffers.

For these things, it actually should be possible to use the C# collection types.

wareya commented 10 months ago

Well, my approach was never about adopting the raw pointers internally, but about adding C++ overloads that can make a Godot.Collections.Array from data pointer instead of [...]

Godot's array types are dynamic arrays, not static arrays. They need to be able to resize themselves, which means that the pointers inside them have to come from the same allocator that they themselves use. This is a UB minefield and probably wouldn't even be safe for C# code.

(They would need to come from the same allocator even if they were static arrays, since they have to be able to destruct themselves when their last reference expires, but the possible workarounds for that are less dangerous than with full reallocation.)

Nomad1 commented 10 months ago

Well, my approach was never about adopting the raw pointers internally, but about adding C++ overloads that can make a Godot.Collections.Array from data pointer instead of [...]

Godot's array types are dynamic arrays, not static arrays. They need to be able to resize themselves, which means that the pointers inside them have to come from the same allocator that they themselves use. This is a UB minefield and probably wouldn't even be safe for C# code.

Those are created and resized in C++ code with a consistent allocator: https://github.com/godotengine/godot/blob/master/modules/mono/glue/GodotSharp/GodotSharp/Core/Array.cs So it's safe and nice, but not super-fast since every Add or Count call is an Interop. I'd love to avoid those classes in ultra performance-critical parts, i.e. when you are animating the water on the CPU and sending a bunch of vertices every frame. However as I said before, I don't think that it would make a big difference anyway. Will need to do some profiling first.

wareya commented 10 months ago

So it's safe and nice, but not super-fast since every Add or Count call is an Interop. I'd love to avoid those classes in ultra performance-critical parts, i.e. when you are animating the water on the CPU and sending a bunch of vertices every frame. However as I said before, I don't think that it would make a big difference anyway. Will need to do some profiling first.

I feel like something similar to this could be achieved by reading the raw pointer out of the array and making sure not to destroy or modify it array until you're done using the raw pointer, rather than by trying to construct a godot array out of a raw pointer.

Nomad1 commented 10 months ago

So it's safe and nice, but not super-fast since every Add or Count call is an Interop. I'd love to avoid those classes in ultra performance-critical parts, i.e. when you are animating the water on the CPU and sending a bunch of vertices every frame. However as I said before, I don't think that it would make a big difference anyway. Will need to do some profiling first.

I feel like something similar to this could be achieved by reading the raw pointer out of the array and making sure not to destroy or modify it array until you're done using the raw pointer, rather than by trying to construct a godot array out of a raw pointer.

I doubt it's possible. We can only guarantee that the raw pointer is immutable during the single call and within fixed(Vector2 * ptr = &array) { ... }. But that data should be copied to some other collection anyway. Right now it's done in the managed code in the GodotSharp assembly, I recommend doing it in the unmanaged code, Juan proposes to use unmanaged collections from the start. All three approaches have flaws but Juan's one is easily achievable.

reduz commented 10 months ago

@wareya @Nomad1 Consider that what I am describing in this proposal is not possible yet because the Godot collection types are not properly exposed in C#. They need to be rewritten to work as I describe using the new extension API in Godot.

The is because C# support in Godot is quite old (from the Godot 3 days) while Godot 4 uses a new extension API that is far more efficient and supersedes all the glue code that Ignacio had to write from scratch for mono.

Ideally, in the coming months (after getting current C# to work on mobile), C# support in Godot will most likely be rewritten to use the new Godot 4 extension API (this will make performance 100% optimal, allow rewriting any part of Godot in C#, even renderers, add classes as native classes, unify the .net editor with regular editor, etc) and this will be implemented there.

lupin-de-mid commented 10 months ago

Consider use Span as return type. That could be nice wrapper around native arrays

Repiteo commented 10 months ago

I don't know if NoAlloc needs to have any special designation in the name; would a native collection as an argument be enough to make that functionality implicit?

wareya commented 10 months ago

I don't know if NoAlloc needs to have any special designation in the name; would a native collection as an argument be enough to make that functionality implicit?

Godot's API gets exposed to languages that don't have argument-list-based overloads, and you might want the API to be consistent between them and languages that do. I'm not sure if NoAlloc is the best way to do it, but it makes at least a little sense.

Repiteo commented 10 months ago

Godot's API gets exposed to languages that don't have argument-list-based overloads, and you might want the API to be consistent between them and languages that do. I'm not sure if NoAlloc is the best way to do it, but it makes at least a little sense.

Oh yeah, in the native binding format it makes sense to have a designation. I was thinking more in the context of C# specifically, and if it could suffice with a uniform name on the surface

AwayB commented 10 months ago

Are there any actual confirmed cases of GC stalls caused by this? Don’t get me wrong, but getting a huge array of vectors from the engine is not a common task, it goes to Gen1 type GC queue and shouldn’t affect performance anyway. Submitting arrays to the engine is an another story but still can be improved easily with pointers - pass a pointer to the array instead of doing stackalloc and copying the contents in manage score but copy it in CPP part instead. My concerns is that Godot.Collections are not so common and flexible as built in CLR types and with it we are moving farther from vanilla .Net to a custom language dialect.

I do not believe that these GC stalls happen, or are a problem at all.

I believe that this proposal stems from the recent social media pressure. Since 2 weeks ago, there's been a very judgmental tone online from a few of the recent "Unity refugees", about Godot's supposed "poor performance". The demands they've been giving can roughly be summarised as:

While this mentality would make sense in Unity, where the codebase was handled by Unity and was never the community's responsibility, and it only supported C# for coding, it doesn't at all in Godot, where the community has to maintain a large and complex codebase on a limited budget, and where the usability/performance ladder goes:

In the past days, attempts to explain these design aspects of Godot to these demanding Unity users has been unfruitful. The main point of failure in these attempts to explain is sadly that, well, they don't care. While some Unity former users have made amazing work with Godot's features, and one absolute monster has transferred the entirety of his game's logic in C# from Unity to Godot in 14 hours, for some Unity devs, it's been a full public judgement of Godot's codebase before talking to anyone from Godot, or even asking why are things made the way they are.

Articles have been written and have taken an extremely small part of the engine and made extensive judgements with hyperbolic words such as "terrified, limited potential, dramatically and systematically worse performance". These articles have the same validity as me going to Unreal and picking the worst, most unoptimised part of their stack and claiming that its "incredibly unoptimised"...yet a very large number of people have taken these articles as a legitimate study of Godot's performance in depth and ran with it.

Besides the quick, and mostly false, judgements, the bigger problem is that the claimed pain point of these people, performance, is, at the risk of triggering them, of secondary importance for most of gamedev, and only one of the main goals of Godot's design. As I said above, there are 3 languages that Godot lets you develop with, and each responds to a different kind of need in gamedev:

This division of labor is because gamedev isn't about slogging through code optimisation like you're making YAPSQL but about a slew of different trades working together (artists, animators, game designers, programmers, render/physics specialists, tool engineers, etc). GDscript is meant to help artists, animators, designers, iterate as fast as possible, with a simpler, readable syntax, running on a VM, with tight integration, hot reloading and extremely practical iteration. C# is to open up generic purpose, high performance code to a very large ecosystem of libraries and tools, for programmers to add any logic that their game could need. C++ is to get the best performance possible, in the most performance critical areas of the engine, for render, physics, or top performance code. And GDExtension, even without modifying the C++ source code, allows to extend into almost any language, so you can even get your own flavor of code.

The very design of the engine is centered, all the way down to its multiple languages, to cater to all these needs as best as possible. Each layer fit for purpose, with performance and usability being yin and yang. This in my opinion, is Godot's greatest strength. It's what sets it apart from any other engine. Godot intelligently balances out performance, usability, productivity, amount of targets, and manages to remain greatly sustainable and multifunction.

However, and without mentioning any one individual in particular, this "star feature" of Godot was completely ignored in favor of a highly ideological, inapplicable vision of programming and...well, in favor of Unity. The given justification of "performance", which is waved around like a flag, falls apart quickly when their complaints are about non-performance critical parts of Godot, or when you ask them to directly use the engine's C++. The former is rejected on ideological grounds: "everything should be high performance". Even if it needlessly complexifies and damages the engine's code clarity. The latter on convenience grounds: "I don't know C++". Asking for the greatest C# performance and NoGC policies when Godot's C++ and GDExtension are right in their noses is, to say the least, out of touch.

In fact, they are so out of touch that we're now to the point that rather than them dealing with C++'s manual memory management, we're now being pressured into considering a proposal where we have to build a whole new copy of the C# bindings to walk around C#'s GC, instead of telling them to use the base engine language that already has none. Because C# was their main horse in Unity, and so they demand that it remains it now, and Godot is supposed to reimplement a custom version of something that already exists.

Whether you look at this situation from a usability, productivity, code clarity, or performance standpoint, I do not believe that this proposal responds to any needs of Godot, but merely satisfies the demands from a group of people who intend for Godot to patchwork itself into replacing Unity for them.

I do not believe that this extra layer of complexity to C# is in the interest of Godot, or fitting to Godot's brilliant design. If truly, a C# layer with strong performance isn't good enough, then C++ is right open to a user. Or GDExtension. Instead of flatly stating why Godot is what it is, we're conceding to social media pressure from a vocal minority, and considering denting Godot's excellent design, for people who did not and do not care to understand Godot, did not care to try it for the vast majority of them, and who want:

Even if it brings:

This isn't tribalism, nor is it personal. It is a raison d'être. Godot is designed in the best way that I can think of. Unity was not. This proposal is encouraging that we start abandoning Godot's greatest strength to appeal to Unity users who did not even care enough to try Godot, let alone adapt to it, and instead, come with quick judgements and a demanding stance. The numerous other users that have engaged positively haven't been close to this demanding, and instead lay praise on Godot regularly, whether for its usability or its performance or design.

Godot has always been open to bettering itself. I don't believe that this is about bettering it. This is departing from its clean design to satisfy people who do not even try to understand Godot and judge it "wrong" for not being Unity.

I believe that it is a very bad idea to support this. What this will effectively do is set a precedent where, instead of imposing Godot's design, we accept to harm the design in the name of appealing to Unity users who will certainly keep pushing further for Godot to become even more like Unity. Complaint after complaint, we will have to argue Godot's design against Unity's design, which will only fracture the community.

While there is certainly some good to take from Unity, dismissing the core design of Godot is not something we should encourage or accept. It'll lead to the original Godot users pushing for Godot, while the new ones will push for Unity. When the Unity users who started out by writing judgmental articles without knowing Godot well are encouraged with "Unitifying" it, we're marching against Godot's long term interest.

I believe that we should accept that in the implosion of Unity, we have gathered a massive amount of new users, support, and love, and that some people will not even care to look at Godot for what it is, and should be politely told that it's probably not the engine for them. And if they feel like this is a justification to throw shade on it and make judgmental claims without having even tried it, well, they were doing that from the start. Caving in is not going to stop that attitude. If anything, after this request, there will be another, and another, and more and more, and that any rejection will be another round of public complaints and judgements, until we cede to them again.

Is this proposal made in the interest of Godot? Is it following the proper division of labor and design principles of Godot? I think not.

Is it meant to concede to this vocal minority of Unity devs who have not engaged with Godot, but merely demanded that it change to suit them better? Will this minority be encouraged to demand even more and keep reshaping Godot to fit the preferences they have for Unity? I think so.

Please consider this along with the technical aspects of this proposal.

GabrielRavier commented 10 months ago

@AwayB Ok, so let us assume for a minute that the presuppositions on the technical merits of the proposal that you are making here (that those merits are near to non-existent) are true (not that I agree with these presuppositions, but it should make things simpler).

...So what if this proposal is "meant to concede to this vocal minority of Unity devs who have not engaged with Godot". Why is it supposed to matter whether there is much a particularly large technical merit to the proposal ? From what I've read in this thread, the technical cost of the proposal is near-nil. Is that such a big cost to take in exchange for getting a massive amount of Unity devs over to Godot ? In other words, it seems to me like even the smallest technical merits for the proposal seem to me like they would be enough to accept it, given how little it costs to accept compared to how much good it can make for the Godot project, even if only to see if it is worthwhile in practice. Is the mere existence of a zero allocation C# API next to the other bindings going to make the rest of them worse ?

The only arguments I can see in your post that could counter this would be that either this will magically send Godot down some hyper slippery slope that will instantly destroy the engine, or that the engine being popular is actually a bad thing. Neither seem plausible to me. In particular, could you detail how this is supposed to be "dismissing the core design of Godot" ?

Also, one last thing. I'd like to point out one quote from what you've said:

I do not believe that this proposal responds to any needs of Godot, but merely satisfies the demands from a group of people who intend for Godot to patchwork itself into replacing Unity for them.

What does the "needs of Godot" refer to here ? Is Godot some kind of abstract entity completely separate from what its users want ? Because that sentence just sounds like "I don't think Godot should answer to the needs of its users" to me.

Visne commented 10 months ago

The only real counter-argument I could find was this:

we're now being pressured into considering a proposal where we have to build a whole new copy of the C# bindings to walk around C#'s GC, instead of telling them to use the base engine language that already has none

But they themselves admitted that C# allows for faster iteration than C++, and I think we can all agree that it is easier to use as well. Obviously developers using Godot should be allowed to pick C# over the other two languages, and whether they come from Unity or elsewhere is irrelevant to the amount of support they would get from the engine.

And while there is undeniably an influx of Unity developers, the alleged "social media pressure" from these developers is far outweighed by the doubling of monthly Godot funding that came as a result of all of this. I assume that is more than enough to develop and maintain the relatively small part of the codebase that would be required to implement this proposal.

Repiteo commented 10 months ago

Technically, this could be done from the binding generator itself, without breaking compatibility, and without doing any modification to Godot itself

Ideally, in the coming months (after getting current C# to work on mobile), C# support in Godot will most likely be rewritten to use the new Godot 4 extension API (this will make performance 100% optimal, allow rewriting any part of Godot in C#, even renderers, add classes as native classes, unify the .net editor with regular editor, etc) and this will be implemented there

Both quotes from reduz, the first being from the OP. This isn't a matter of overhauling a working system to appease a vocal minority, it's recognizing something that can be implemented in a rewrite that's going to happen anyway. The C# system as it currently exists is inherently legacy, there's nothing antagonistic in saying that

Nomad1 commented 10 months ago

@AwayB

Is this proposal made in the interest of Godot? Is it following the proper division of labor and design principles of Godot? I think not.

Actually, I want to second this statement, despite several negative emotions that some people put underneath that post and the obvious prejudice against Unity newcomers (among whom I consider myself to be, since my previous engine was Unity, ignoring the fact that I used it only for rendering). GC stalls might be a problem or might be not. But I really doubt that it's that critical to the engine auditory as a whole and I'm sure that effort could be put elsewhere with much more benefit to the community. Of course, it would be nice if something could be done pretty much for free (= a few lines change in the binding generator), but if it takes more than a day, I'd vote to skip it. And yes, there are bigger challenges on the horizon, not to speak of well-known issues like a lack of mobile platforms.

Nomad1 commented 10 months ago

Technically, this could be done from the binding generator itself, without breaking compatibility, and without doing any modification to Godot itself

Ideally, in the coming months (after getting current C# to work on mobile), C# support in Godot will most likely be rewritten to use the new Godot 4 extension API (this will make performance 100% optimal, allow rewriting any part of Godot in C#, even renderers, add classes as native classes, unify the .net editor with regular editor, etc) and this will be implemented there

Both quotes from reduz, the first being from the OP. This isn't a matter of overhauling a working system to appease a vocal minority, it's recognizing something that can be implemented in a rewrite that's going to happen anyway. The C# system as it currently exists is inherently legacy, there's nothing antagonistic in saying that

reduz stated that he is now under constant pressure from the newcomer community and social media. The fact of this proposal might be a sign that he is already thinking about diverting the energy to follow the vox populi instead of his own roadmap of engine development. I don't think that's a good sign. I was on this road once and it's paved with good intentions but leads to some other place.

Repiteo commented 10 months ago

Okay but, this ISN'T a diversion in direction. It's a proposal. A proposal that someone can choose to act on, on their own time, without any external pressure. The roadmap hasn't changed one bit, and I don't think a proposal alone could change a roadmap

Visne commented 10 months ago

Indeed, can we focus on actual technical arguments for or against this proposal, instead of theorizing whether this proposal was made because of "public pressure"? It is completely irrelevant to the merit of the proposal anyways. All it will end up doing is getting this thread locked to contributors, which will do none of us any favors.

tannergooding commented 9 months ago

Preface

Just to preface, I work on the .NET Libraries team.

I own several areas in the BCL including buffers, memory, numerics, and hardware intrinsics. So I'm deeply familiar with and interested in areas that can provide the best performance and generally in all the types that get used in writing a great game engine.

That being said, everything below is simply my own standalone viewpoint/opinion. It doesn't necessarily represent a shared view of others on the .NET team or of my employer, etc. I don't expect everyone to agree and would just like to provide some perspective.

Allocations aren't, strictly speaking, the problem

It's important to note that allocations are themselves not the problem. Nor is the fact that an allocation is GC tracked. Performing an allocation, with any decent memory manager, is nearly free. That's true in C, C++, Rust, Java, C#, etc.

The actual cost typically comes in the form of:

  1. Freeing that allocation
  2. Long term fragmentation that can build up over time

Moving your allocation to the stack so that its implicitly freed as part of exiting a method scope isn't magically faster. It's just simplified by virtue of not allowing arbitrary frees. Likewise, it has it's own problems in that you can only allocate a small amount per method as otherwise you risk causing a stack overflow.

Likewise, moving your allocation to native in the form of APIs like NativeMemory.Alloc (which is a thin wrapper over malloc) doesn't fix things either. Not only do you still have to manage the memory yourself, which can be error prone, but you're still incurring essentially the same cost in those allocations. You then also have to worry about any fragmentation that can build up as part of repeatedly allocating/freeing data and also incurring the cost of the free directly. Cross threading concerns, etc. -- There are many C/C++ engines that refuse to use features like shared_pointer or common RAII setups specifically because the cost of freeing so many objects can be detrimental, particularly because they happen inline.

The primary solution to this is simple: "recycle" or otherwise "group" your allocations. This is typically achieved via something like an arena allocator or by something like a memory pooling. There are also many engines that also facilitate this via per frame buffers that basically just get allocated once and then recycled every frame, so that way your aren't doing a bunch of per frame allocations and then throwing them away, leading to an increase in garbage every single render.

How is this handled in modern .NET?

Now, with all that being said, being able to avoid allocations when unnecessary or when something does get used on a hot path can be incredibly beneficial. It is extremely hard to statically determine when a given framework API will be cold vs hot and so it's often better to slightly err on the side of performance.

At the same time, you don't want to complicate your algorithms and spend time optimizing things that are only ever going to be cold paths. You also don't necessarily want to optimize things that simply allow users to write bad code more easily. It's not only the responsibility of the engine to be fast, safe, and robust. It is also the responsibility of the game author to do the same and to profile, optimize based on identified hot spots, to report these findings back to the engine if its on their end, etc.

To help facilitate these needs, Modern .NET has been adding such support to the BCL since .NET Core 2.1 and has been driving that via the general memory abstraction known as System.Span (and System.ReadOnlySpan) and other APIs such as System.Buffers.ArrayPool<T>. We've also provided APIs such as System.Runtime.InteropServices.MemoryMarshal (which allows you to get the backing fields of a Span directly), System.Runtime.InteropServices.CollectionsMarshal (which allows you to get a Span<T> to the backing array of a List<T> for example), System.Runtime.CompilerServices.Unsafe (which allows you to bypass language semantics), System.Runtime.InteropServices.NativeMemory (which gives you thin wrappers over malloc, free, aligned_malloc, memcpy, memset, and so on), and System.Runtime.Intrinsics (which allows you to directly write platform specific or cross platform SIMD code), and so on. We have a ton of general support that allows users to achieve success in their own way.

Span and ReadOnlySpan are the real center pieces here and provide a way to abstract over linear memory. This allows an API to provide a singular overload that can safely support any memory type, including:

This then allows the API to be trivially called from other .NET APIs regardless of how they want their memory to be managed. Thus allowing zero allocation and allocating APIs to benefit from the same singular algorithm and for the consumer of the API to choose whatever works best for them.

When something is an "input", this is incredibly simple as you can just take ReadOnlySpan. The only downside is that because it abstracts over multiple memory types, you can't trivially cache that in a field (so it also can't be used with async) and would need a higher level abstraction (such as System.Memory<T> or System.ReadOnlyMemory<T>) to support that.

When something is an "output", it can get a little bit more complicated, as rather than returning a Span<T> or a T[]; you often take a Span<T> destination as an additional parameter, so the end user can decide how to allocate such a buffer. However, this still isn't a very large stretch for most APIs and ends up working nicely.

Likewise, if you absolutely must allocate a temporary buffer (such as its too big for the stack). The best thing is to use ArrayPool<T>.Rent(...) and ArrayPool<T>.Return(...) which ensures existing allocations can get pooled and not impact the GC over the course of repeated uses. There is a default ArrayPool<T>.Shared that is available, but it is also fairly trivial to roll your own by inheriting ArrayPool<T> and give specialized semantics if neccessary for your domain

So what could Godot do?

Godot would likely benefit the most from taking a look at the APIs it provides and selectively exposing ReadOnlySpan<T> or Span<T> overloads so that they can work with more types of memory. Especially where the memory is not need to be captured by the method, this allows users to choose what is right for them.

It may benefit from porting some algorithms (especially numeric support) directly to C#/.NET so that it can avoid pinning, avoid GC transitions when having to switch from managed to native, or vice versa. This also allows the JIT to lightup with Dynamic PGO, with hardware specific code generation, and more.

It may benefit from providing some unsafe APIs that allow you to get a Span<T> over the backing memory for their collection types (such as PackedVector2Array). This likewise lets users default to the safe/easy to use types but gives them the ability to go zero allocation or zero copying where possible.

It may benefit from providing a dedicated "per frame" array pool, that automatically reclaims any rented arrays at the end of each frame. Other types of pooling or specialized allocators can also help.

It would not likely benefit from exposing APIs that directly took ref T, int count or the like. That is more dangerous and error prone. Span<T> provides the same functionality in a nicer package.

It would not likely benefit from forcing zero allocations everywhere. Nor should it aim to be zero GC tracked allocations in favor of purely doing native allocations. This is just trading one overhead for another and doesn't actually help anyone.

It would not likely benefit from trying to provide Span<T> overloads everywhere. Not all APIs are hot and sometimes an API being on a hot path is representative of the end user doing something "wrong" or "inefficient" themselves.


I'm always happy to be pinged somewhere and to answer questions or provide input.

But it's ultimately up to the Godot team what they want to do. They already have a great product and they are continuing to make it greater over time.

People jumping the gun and assuming that because Unity has a problem that Godot has or will have the same doesn't help anyone. Every product has its ups and downs and, realistically, most of the times the downsides don't matter too much. People have made amazing games with Unity, with Godot, with Unreal, with C#, with Java, etc.

Some of the best selling games of all time are well known to have basically been horrible masses of inefficient spaghetti (Minecraft and Runescape are two great examples that have had their dev's openly talk about it). Even major AAA titles from established studios can have massively different perf characteristics or even robustness (see Skyrim vs DOOM Eternal).

The general point being that, at the end of the day, your game's success isn't going to come down to something like whether or not your game engine of choice has an extra allocation on it's main loop. It's not going to matter whether it runs at 30, 45, 60, or 70+ fps. It's not even going to matter if it has the most pristine looking graphics. More often than not, it's going to come down to much more important factors focused around the general gameplay, story, mechanics, etc (do players find it fun and engaging).

Godot provides all the tools for you to get the job done, and if we work with and provide constructive feedback on what does or does not work well; we can help turn an already amazing OSS product even better. That includes finding out where the real, and not just hypothetical, usability issues lie and opening requests for features that we enjoy from other places and helped make our own jobs easier.

Nomad1 commented 9 months ago

Indeed, can we focus on actual technical arguments for or against this proposal, instead of theorizing whether this proposal was made because of "public pressure"? It is completely irrelevant to merit of the proposal anyways. All it will end up doing is getting this thread locked to contributors, which will do none of us any favors.

Oh, I thought the technical part ended here: https://github.com/godotengine/godot-proposals/issues/7842#issuecomment-1732651680 As I said before, from the game developer's point of view it's better to use native C# collections since they support fast iteration and Linq if needed, Buffer.BlockCopy, Span<> and other things. And it's possible to use them without much GC pressure by sending the array pointers to the managed code. Although it demands another set of C++ methods that accept raw pointers, meaning a lot of extra work. We can use Juan's proposal as well to modify the hot path parts. It's not really a .Net friendly way but it will do the job. Or we can leave it as is.

I propose to do some performance tests to see how much CPU time we can save with all the options.

Repiteo commented 9 months ago

Yes, getting back on topic...

From where I sit, it'd be purely beneficial to have a NoAlloc binding, simply because that would grant an unambiguous means to measure the GC impact (or lack thereof). While the arguments against needing such an implementation are on the basis that the existing systems are optimized as-is, these concerns are valid insofar as there isn't any readily-accessible way to refute that beyond taking the team's word for it. Normally that's probably not enough to work off of, but if the proposed solution is as trivial as it sounds, then I don't see any harm in it. Worst-case scenario, we will have real-world confirmation that it doesn't affect anything & subsequently remove it, giving us a concrete example to point to so we don't need to revisit this in the future

Nomad1 commented 9 months ago

It would not likely benefit from exposing APIs that directly took ref T, int count or the like

Can you please clarify: using native .Net arrays and passing them to the API as pointers, so they will be copied to Godot collections later, won't have any significant performance impact over the Godot collection creation in the managed code via Interop?

csubagio commented 9 months ago

Thanks for dropping by @tannergooding It's good to have a .Net expert weigh in, just need the Godot experts to merge the thinking and apply it to Godot's circumstances.

Whatever path/s ends up being pursued though, this is the kind of thing that's begging for comprehensive test suites to demonstrate conclusively the benefits of one approach over the other. Who's going to volunteer to write something representative :) ?

jams3223 commented 9 months ago

@tannergooding  It's an honor to have you work on this project. You arrived at the perfect time to help people get a better understanding of the situation and even contribute a big part of it. It will help get things done in a cleaner way.

AwayB commented 9 months ago

Indeed, can we focus on actual technical arguments for or against this proposal, instead of theorizing whether this proposal was made because of "public pressure"? It is completely irrelevant to merit of the proposal anyways. All it will end up doing is getting this thread locked to contributors, which will do none of us any favors.

Oh, I thought the technical part ended here: #7842 (comment) As I said before, from the game developer's point of view it's better to use native C# collections since they support fast iteration and Linq if needed, Buffer.BlockCopy, Span<> and other things. And it's possible to use them without much GC pressure by sending the array pointers to the managed code. Although it demands another set of C++ methods that accept raw pointers, meaning a lot of extra work. We can use Juan's proposal as well to modify the hot path parts. It's not really a .Net friendly way but it will do the job. Or we can leave it as is.

I propose to do some performance tests to see how much CPU time we can save with all the options.

Although I have nothing to add on the proposal since I've packed my earlier message with all the design and technical reasons as to why this is not a thing to encourage, I support your last statement fully. Any improvement efforts that would be made towards noGCing anything should start with a bunch of performance tests.

Especially applicable ones to real game development. No theoretical benchmarks of creating/freeing 10000 of the same object, but rather a randomised sequence of instantiating and deleting 10000 objects of different sizes, randomised lifetimes, etc.

tannergooding commented 9 months ago

Can you please clarify: using native .Net arrays and passing them to the API as pointers, so they will be copied to Godot collections later, won't have any significant performance impact over the Godot collection creation in the managed code via Interop?

Copying is (typically at least) more expensive than allocating (most often because you're touching twice as much memory). It is less expensive than defragmentation, which is a "stop the world" event.

Using Span<T> would allow you to safely and efficiently touch the backing allocation directly and thus avoid the copy and any secondary allocation. It would also avoid the need to have any additional abstraction for some types of access. -- It's not always safe for a growable collection, like List<T> for example, as the returned span could become desynchronized with the collection if the underlying allocation is replaced (such as due to a resize or grow operation).

Thanks for dropping by @tannergooding It's good to have a .Net expert weigh in, just need the Godot experts to merge the thinking and apply it to Godot's circumstances.

Glad to be here :)

Whatever path/s ends up being pursued though, this is the kind of thing that's begging for comprehensive test suites to demonstrate conclusively the benefits of one approach over the other. Who's going to volunteer to write something representative :) ?

Writing such comprehensive tests is very hard. What works well for one case, can often be terrible for another. These kind of metrics can also change drastically across different types of hardware and models.

It's always going to be a set of tradeoffs and finding the right areas to invest. I expect it to be very similar to how we have to weigh such things in the BCL and decide the right places to focus on features vs perf vs usability vs ...

@tannergooding It's an honor to have you work on this project.

Notably I don't work on the project, just contributing some time to a discussion 😄

I'm always available to answer questions for areas I care about though, especially as it relates to the areas I manage for the BCL or for how we might approach a given scenario for the BCL.

lupin-de-mid commented 9 months ago

Consider use Span as return type. That could be nice wrapper around native arrays. That could be returned from api call, without allocation on C# part

i wrote this comment yesterday, when here was only few of others. Now we have far better comments with big explanation of modern C# api

CedNaru commented 9 months ago

As part of the team working on the Godot Kotlin module, I only ask that if some changes are made to the internals of Godot, they should be done in a way that other languages can easily use instead of being optimized specifically for C# design. I know many are expecting a lot from the C# binding, specially now that we got a lot of newscomers from Unity, but let's not give it features only meant for it. (I have no doubt it was never the intent, but I still wanted to add my voice to the matter).

xzbobzx commented 9 months ago

@CedNaru The way I understood the issue was that because GDScript fundamentally didn't support these types of bindings, C# didn't get them either. (Lack of struct support, etc etc) (If that understanding is wrong, ignore my argument)

So if you'd want all languages to be able to use all the same API calls, you'd either need to weigh C# down with GDScript's limitations, or update GDScript to support C#'s features.

And while I super agree with the latter, the former option is the entire reason we're talking about this in the first place.

Personally, I see no sense in not allowing C# to support certain things because GDScript doesn't either. "If I can't have it then you can't either!" Doesn't sound very fair towards C#.

CedNaru commented 9 months ago

It's not a matter of GDScript holding back C#. I'm not saying that all language bindings should have exactly the same feature sets, that's just not possible. And if we were to go that way, GDScript has many features already you simply can't implement in other languages (because of how GDScript is so tightly coupled with the rest of the engine).

I'm simply saying that if some new features are added to the API binding layer of Godot (the one used by all languages through modules or gdextension), it should be implemented in such a way the feature is an opt-in for many different languages when they have the capacity to use it, and not depend on any C# specific code. Of course, if everything that is talked about here can be purely done on the binding generator level, then there is no issue but in the case new functions must be exposed to gdextension, ClassDB, Script and such, I want to add the warning that it should be as agnostic as possible.

It's a complicated balance that the Godot core team always managed to keep in a relatively good state, but as this proposal is likely going to be implemented by someone coming from Unity. I just wanted to remind them that we got many, many languages out there other than GDScript and C# that could make use of theses changes as well. Of course, as the binding we manage is using the JVM, a GCed environment, I wouldn't want to miss the opportunity to use any new kind of API call that would be implemented following this proposal as I am facing the same kind of issues as C#.

forrestthewoods commented 9 months ago

Performing an allocation, with any decent memory manager, is nearly free. That's true in C, C++, Rust, Java, C#, etc.

I kinda don't disagree in spirit. But this is a pretty false and misleading statement imho.

Last year I wrote a blog post called Benchmarking Malloc with Doom 3 that went pretty damn deep down this rabbit hole.

My tldr conclusion:

So like yes, most malloc are pretty close to free! Except sometimes at random they are extremely not free. Yes you will hit random 0.5 millisecond calls to malloc in the real world. And yes that includes modern allocators like rpmalloc and mimalloc.

Hitting the allocator is kinda bad. Also almost all modern games hit the allocator. It's not great, but it's not immediately catastrophic. Unreal is full of slow virtual function calls. Not great, but the flexibility is ultimately worth the hit. That's what always happens in a general purpose game engine.

I'm not deathly allergic to allocations. But if it's, ahem, "free" to provide a no-alloc path that's a very good thing! I try to avoid "death by ten thousand paper cuts". A thousand paper cuts I can deal with. A small engineering team can fix a thousand little perf cuts in a couple of months. But ten thousand is too much. That's worth nipping in the bud if you can. IMHO.

reduz commented 9 months ago

Some clarifications for many comments here:

Effort

Implementing this does not take considerable effort. The only limitation is that the Godot collections need to be exposed better and give access to things like Span, but as far as providing those functions as mentioned in the OP, an hour of work at best.

Diverting of resources

I am tired of hearing this argument time and time and time again. People think Godot works like a company and this is not the case. We don't operate under the rule that we have people working for us that we tell them what to do.

Godot has hundreds to thousands of contributors who would be happy to lend a hand in these things as a project. Each has a preference to work on different parts of the engine, and we normally don't ask them to work on others. The difficult part is more agreeing and reaching consensus on what has to be done, but once this consensus is reached either there are volunteers (for simpler tasks) or it can be funded via sponsoring if its something that has enough demand (would not be this case, this is a simpler task).

Godot is NOT designed around GDScript

Godot uses an universal binder API for storage, serialization, networking, interospection, editing and language bindings. As such the types it exports are restricted to this API because adding new types is extremely costly. If you think it's easy, you are mistaken. I won't explain why here, get familiar with the Godot codebase.

In short, large part of why Godot is so small and efficient has to do with this. Its not a simple technical decision to explain in a paragraph here, you really have to become familiar with how Godot works to understand.

This binder API exists since long before GDScript. GDScript is just a custom scripting language designed around this API, so I suggest stop using the argument that you can't add more complex types due to it. You can't because them being restricted brings more benefits that limitations.

Additionally, the binder API does support custom structs to expose to non-dynamic languages (such as GodotCPP or C#), but they are reserved to provide optimized versions of standard API functions and used only where it makes sense.

Using Span

I explained this a couple of times, but I guess too many messages here to read for everyone. Godot uses its own native collections (refcounted arrays) in the majority of the API. As such it is not possible for Godot to provide an API based on Span to C# under any circumstance when on the Godot API side you have an array. Please understand this, It's not technically doable in any way, stop asking for it.

It is only possible to use Span in maybe a dozen of high performance APIs exposed by the binder, but as a general rule this will not be the case.

Using Span in Godot arrays

This is the right solution in this case. As I mentioned before, Godot will give you a PackedArray, or expect one from you in most cases. The idea is that you can access its content for reading or writing efficiently via dedicated Span interfaces to it.

Again understand that Godot is not a C# engine, and on the C++ side it uses all relatively safe code for security and stability (very little pointer usage, instead all arrays that do bounds checking in debug builds), only relying on pointers to arrays for very very critical APIs.

I apologize for the harsh tone, but my intention here is that you don't waste time discussing paths that won't lead anywhere, while concentrating in what is possible to action on.

Frontrider commented 9 months ago

I think I'd also add that C# in gaming most likely only has momentum/weight because Unity chose it then became big. (especially as CedNaru pulled out one of the alternatives that can replace the feature we need effectively 1:1)

edit: C# is not a bad language, but it is not here because it is the ultimate solution.

PavelCibulka commented 9 months ago

I think I'd also add that C# in gaming most likely only has momentum/weight because Unity chose it then became big. (especially as CedNaru pulled out one of the alternatives that can replace the feature we need effectively 1:1)

edit: C# is not a bad language, but it is not here because it is the ultimate solution.

C# changed a lot starting by release of .Net 6 in 2021. I would choose C++ in past but C# has better performance now. How is it possible? There has been masive improvements in .Net 7 and .Net 8 to JIT - everything what AOT does - but fine tuned to machine CPU, converting code to vector instructions etc.

But biggest bennefit comes to Modding. You load configuration at the start of the game into "static readonly" variables. JIT will modify and optimize code according to this settings in MOD. For JIT "static readonly" variable is equivalent of CONST in AOT language. You would need to release source code and let modders recompile the mod in C++.

BTW I'm not C# dev from Unity. I'm JAVA dev with experiance in C++ and Assembler (and lately C#).

Frontrider commented 9 months ago

C# changed a lot starting by release of .Net 6 in 2021. I would choose C++ in past but C# has better performance now. How is it possible? There has been masive improvements in .Net 7 and .Net 8 to JIT - everything what AOT does - but fine tuned to machine CPU, converting code to vector instructions etc.

But biggest bennefit comes to Modding. You load configuration at the start of the game into "static readonly" variables. JIT will modify and optimize code according to this settings in MOD. For JIT "static readonly" variable is equivalent of CONST in AOT language. You would need to release source code and let modders recompile the mod in C++.

BTW I'm not C# dev from Unity. I'm JAVA dev with experiance in C++ and Assembler (and lately C#).

And my statement is still pretty much correct, as none of those features are C# exclusive, and are also very true for Java if you mention that. ;) (look at minecraft modding) Additional reason to not to have Godot's api cater to languages that are bound to it.

I'll repeat that what I was saying is that c# is not here because it's the final ultimate tool that nothing can beat, it is here because Unity.

PavelCibulka commented 9 months ago

That is not true. Java JIT is very basic. I don't think there is another language with as powerfull JIT as .NET. (But I can compare only Java and C#)