Concurrent Multithreading Functions

mistletoe commented 9 months ago

Is your feature request related to a problem?

Problem: GMS is basically single-threaded, in 2023. The problem's obvious: CPUs aren't getting faster like they used to, but end-users are going to want GMS to be competitive with fully-multithreaded game engines which take full advantage of multi-core CPUs. This is a major competitive threat to YYG's business model, and for us developers, it's a serious problem, if we're building games beyond a very limited scope. Here's a brief suggestion for a way to address this with the minimum amount of pain and suffering at YYG.

Describe the solution you'd like

My experience with multithreading is in AI development for realtime game development. I have in the past built a multithreaded AI in Java, using the Lombok library, responsible for running several hundred actors, and I am aware of most of the major pitfalls associated with multithreaded code.

Essentially, the problem for GMS is dealing with concurrent modification errors. If this system isn't built robustly, non-experts will not be able to use this feature. I'd like to propose a system that will allow non-experts to build and use threads without understanding the principles of lock states, sleep, etc.

Basically, users would write a new type of function: a threadedTask(). This would have some surface resemblance to Unity's Job System, but with some important alterations to make it more suitable for GMS's audience. https://docs.unity3d.com/Manual/JobSystem.html

threadedTask() would operate much like a GMS function() call, but would:

Start a new thread.
Be able to check the main thread's state in memory, but not modify it until it terminates.
Because of the modification requirements, a threadedTask() would not guarantee frame-time accuracy or perfect state accuracy in general When threadedTask() threads returned, anything they have done which would modify the main thread's state would be queued, and execute during BeginStep (or right before that, preferably). a threadedTask() that, for whatever reasons, exceeds one frame in timing would therefor execute on the second frame after being called.
To handle the modification requirements, the GMS engine would:

A. Put all threadedTasks() into a thread manager, with a poolsize defined by the end-user (or less, if fewer CPU cores are on the device). B. When a threadedTask() returns, it will automatically enqueue all attempts to change state. C. Enqueued results would then be checked by type and safely allowed to change state only in the main thread. Errors would be logged if attempted to modify in a way that doesn't work (for example, attempting to destroy an Instance who has already been destroyed on the previous frame) but wouldn't cause halting errors.

This would be largely be done by something like this pseudocode:

threadedTask( Instance instance_we_want_to_change ){ instance_we_want_to_change.variable_we_want_to_change = real; instance_destroy(instance_we_want_to_change); }

This would place two enqueued commands into the threadedTask system in GMS, running in a FIFO order.

First enqueued command would:

Check that the Instance still exists (since that's not guaranteed).
Alter the Instance variable if yes.

If the first check fails, the second isn't performed, the threadedTask returns false in the main thread, and nothing in the queue should be executed. All commands touching Instances should be handled with a existence check before any instance variables or other instance states are changed.

If it fails, the developer would see that it returned false, what it was attempting to do, and that it cancelled the queue for this threadedTask(), so (unlike a lot of multithreaded systems) debugging would be relatively straightforward, rather than a mystery. Or, if the issue was simply, "this Instance was Destroyed before the thread returned" then it's not a big deal anyhow.

So, no matter what the end-user puts into a threadedTask, they're simply not able to directly alter GMS's core state at all until it returns. The multithreaded code may query the state at the time it requests it in memory, but is not guaranteed that it hasn't changed before queue has executed.

Limitations.

threadedTask() cannot:

Guarantee accuracy.
Guarantee timing.
Alter any Instance or Global directly.
Create any new Instance variables (not that you should be doing that during normal Functions, either but... yeah)
Initialize or create any permanent data structures or Globals directly. For example, if you want to make a Struct for a threadedTask() to fill with data, you need to make the Struct before calling it, not internally.
Move, alter or otherwise edit the Physics World state directly.
Create Particles, Instances, etc. directly.

So, dear YYG people... you're probably looking at the above and going, "What a huge hassle, the above code would perform worse than not threading it, and it wouldn't guarantee to even operate properly, because state may have changed? Why on Earth do you want this?".

Simply put, for much more complex mathematical or data operations, where I can totally live with all of the caveats in exchange for greater speed on average.

For example, let's say I want to do frequent A* pathfinding on 1000 AI actors in a large game world. Sure, maybe 1 in 10000 calls results in a failed command in the queue because the state changed on the last frame, but it's fine, a handler can be written to deal with that. Most major engines handle AI like this already.

Or I want to do complex simulations on 100K objects and I don't want the main GMS thread to even know it's going on.

Or I want to run a complex simulation on sectors of a gameworld including Very Complex Physics (See: Noita).

Or maybe it's for convenience: running a thread for a user-based scripting language to modify how my game works, etc., etc.- stuff that I would really rather not have clogging up the CPU dedicated to GMS. Obviously, this not something every GMS user would need, but I strongly suspect that enough people are doing things like A* that it would have a clear and obvious benefit.

Describe alternatives you've considered

The only "alternative" is writing a DLL.

This just doesn't work, because GMS simply does not let outside DLL's access anything useful about the internals of GMS's state, period.

All we can do is send it numbers, and it'll send back a result. You can't tell it to "do line_collision on everything in the Room" or anything really useful, at all. Instances are treated as reals; there's no way to directly access state, utilize callbacks, etc., etc. It's unworkable, and making it workable would basically mean surrendering GML entirely and forcing us to write our games in C# or something. While that'd be fine for most of the serious users, it takes GMS too far away from its newbie-friendly base audience, imo.

Additional context

If you're bored and want to see the code I've written for handling this kind of thing before (again, in Java, not C++) I'll be happy to share it with you and discuss how it functions. I think that my description of how this should work is comprehensive enough, though. Basically, writing it like this, as a strict FIFO queue that only executes when it returns... it may hit a failure at any time, but that's OK, the main engine won't crash due to Concurrent Modification Exceptions.

I've encountered more than one coder who hasn't done this kind of thing resisting the idea of "code that cannot be guaranteed" in the past, and I think they largely just don't get it; in a realtime game, it's OK for certain operations to fail or occasionally produce "inaccurate results", especially in the context of AI-related tasks, where updates happen frequently and minor errors that only happen once in a few thousand frames and don't last very long are pretty unlikely to be noticed by end-users.

The benefits are obvious: even with the time lost to thread maintenance, enqueueing etc., for major operations that consume notable amounts of a frame or more, it'll be a net win. Imagine people being able to code up games with both more sophisticated AI, impressive simulation, even visual stuff.

FoxyOfJungle commented 9 months ago

Multi threading is super necessary. On mobile platforms the situation is worse, since most cell phones have processors with 2Ghz threads below... :/

mistletoe commented 9 months ago

Yeah, that's part of why this is a big deal. Same with console ports of anything intensive, etc. We're in a world where MT code is basically mandatory, and I'd really hate for Unity to eat YYG's audience when in some ways, GMS is so much better for 2D gamedev.

I know that this is a Major Feature Request (along w/ what I've asked for re: Box2D) but I honestly feel these are initiatives that would provide a great deal of value for GMS going forward. That's why I've tried my best to provide a decent pseudocode sketch of the implementation; I know that with multithreading in general, the problem is largely figuring out how best to handle lock states and deal with concurrent-modification issues immediately so that the system's pretty bulletproof. Lombok for Java handles lock states gracefully, but does not handle concurrent-modification errors, so in my AI I wrote, the way it basically works is:

AI has that will be used to make in the main thread.
When the multithreaded AI runs, it can query the game-world's state variables, but may not alter them at all. For example, it can get a list of "every enemy ship in the world" to, say, do a fast-square distance check against, but it may not add new enemy ships nor remove any, because that would change the main thread's state.
When the AI's code completes, it returns a simple array that's translated into commands in the main thread. Therefore nothing in the game-world's state will change via the multithreaded code during its execution, but only when it returns, avoiding concurrency problems.
Each AI is assigned to a thread pool, so it can run on all the CPU's processors or be limited to just a few. Experiments showed me that using more than 2-3 processors wasn't gaining any more speed than I was losing to thread maintenance, but circumstances would vary considerably for projects. Obviously, this is a detail of implementation I didn't touch here; if YYG's even reading this, they're going to need to decide how this works and whether end-users (gamedevs) can even touch that.
There are other ways to implement this idea. For example, all GML could run MT, with state changes being limited to after everything else runs. Every Instance could be thrown into the thread pool under the hood, with the game's state locked and frozen until they're all dispatched and return. Maintaining concurrency would be a nightmare, but hey, it'd work, in theory.

But would that actually be faster? No, most likely. Problems start rearing their heads immediately. Where does that leave, say, running GML during drawing events, which are currently decoupled from Step entirely, and are (if you're doing it right) generally very light operations?

As my example shows, just because something's multithreaded doesn't make it magically faster. To be faster, the operations involved need to be pretty heavy, and it'd be best to let developers judge that. So I came up with this idea of a threadedTask() to address this; it doesn't require that whatever's running multithreaded be run in any given place, Event, etc., but it at least forces the end-user to think about it and consciously use it.

thennothinghappened commented 7 months ago

When the AI's code completes, it returns a simple array that's translated into commands in the main thread. Therefore nothing in the game-world's state will change via the multithreaded code during its execution, but only when it returns, avoiding concurrency problems.

This reminds me a whole lot of JS's worker threads system where threads simply send messages to the main thread, which get queued up to be processed in the event loop when the main thread is available (and vice versa) which feels like a very safe approach to state in concurrency that I'd totally be on board with (and would easily fit into GM's event system, say an "On Thread Message" async event or something better named)

Also, throw on top of that some of the other stuff that JS does in this regard feels like a potentially very powerful system - e.g., OffscreenCanvas translating to the ability to do surfaces and such off the main thread in the same manor.

mistletoe commented 6 months ago

I'd love to work on Surfaces in another thread, for sure!

ParodyKnaveBob commented 6 months ago

My zero working experience with multithreading makes me a non-expert. $:^ ]

Besides the request for custom functions offloaded from a single CPU, one thing I've wondered for quite a while concerns at least some of GM's async functionality. Unless it already happens under the hood without documentation, would it be reasonable to move some functions such as sprite_add_ext to another thread when available (or at least give the user an option to flag it so)?

rwkay commented 6 months ago

My zero working experience with multithreading makes me a non-expert. $:^ ]

Besides the request for custom functions offloaded from a single CPU, one thing I've wondered for quite a while concerns at least some of GM's async functionality. Unless it already happens under the hood without documentation, would it be reasonable to move some functions such as sprite_add_ext to another thread when available (or at least give the user an option to flag it so)?

These functions are already async all the sprite_add functionality is multi threaded.

YoYoGames / GameMaker-Bugs