godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.11k stars 20.2k forks source link

Thread::wait_to_finish throws error when called on a joined/returned thread #71048

Open Jimmio92 opened 1 year ago

Jimmio92 commented 1 year ago

Godot version

4.0 master

System information

Win 10, AMD Radeon RX 590

Issue description

I have a thread to handle UPNP stuff which blocks when called.

When the thread is done, it signals. The main thread awaits this signal, and then calls wait_to_finish -- this throws an error about not being able to join a thread from itself. This means it's being cleaned up already like expected -- yet there is also a warning saying we must call wait_to_finish or it won't be properly cleaned up.

I have solved this problem with my personal fork of Godot by changing Thread and everything in the engine that uses it to use normal naming conventions (wait_to_finish becomes join, like everything else is named), and providing detach. I also moved the "cleanup" code, of which there was none (why was that warning even there?), to the destructor, making the class no longer require the warning about calling wait_to_finish, and also bringing it more in line with what most people expect out of a thread.

My only concern is.... I don't know what I'm doing when it comes to GitHub -- do I make a pull request? The fact that I changed the name of a method that's in.... some 45 files if my text editor didn't lie... means I doubt this will make it in to 4.0, if ever. But there's definitely a problem here, even if my code isn't the solution.

Steps to reproduce

Create a thread; don't call wait_to_finish on it. Warning appears that seems needless.

Create a thread and await a signal saying it's done. After the await, call wait_to_finish. Note it throws an error.

Minimal reproduction project

test_case_thread.zip Open the example.tscn scene and run it with F6. Note the warning about not having called wait_to_finish and also the error about trying to call join on itself.

NOTE: I tried to keep it exactly accurate to what went wrong so it still does do a UPNP discover -- this is not part of the problem, merely there to be the same timed external event.

Chaosus commented 1 year ago

Not sure whether it's a bug, or intended behaviour for threads. Maybe some of @godotengine/core devs may say better.

Jimmio92 commented 1 year ago

Pretty sure it's a bug as there's no way to use threads without a warning/error.

Also, with all the talk about Godot becoming AAA compatible (see https://godotengine.org/article/whats-missing-in-godot-for-aaa/ ).... the change to C++ standard threads is backwards from what they're wanting. Huge performance losses on Windows -- each mutex lock/unlock has to jump ring levels to the kernel and back. Critical Sections can be used as a drop in replacement and they often do not have to jump to the kernel.

Design by committee is always challenging, but we should never backpedal, imo.

vnen commented 1 year ago

The thing is that when you do await discover_done the function will be resumed by the thread that emitted the signal. So the wait_to_finish() call will happen on the thread, which triggers the can't wait to finish on itself error (which means it never really waited). Such error will end the _ready() call/resume, which then will clean up the stack, freeing the Thread object that was created locally, thus triggering the warning (thread destroyed without call to wait_to_finish()). It seems to me everything is working as intended.

You can solve this by emitting the signal in a deferred call, since that will be queued and then called from the main thread. The issue is that there is no emit_deferred() (which I think would an welcome addition), so you need to do something like emit_signal.call_deferred(discover_done.get_name(), true) (the get_name() here just to avoid strings and typos).

For me the only "solution" to this would be adding an emit_deffered() method to Signal to be easier to use on this case.

Jimmio92 commented 1 year ago

Thank you for providing a way to work around my issue.. but I still feel this can be more elegantly solved by just allowing detach to be called on the internal thread object and changing the logic on thread destruct internally as I've done in my personal fork of Godot.. but I'm totally unfamiliar with GitHub and the whole process. (Note I also renamed "wait_to_finish" to "join" to better match... everything else... but I did so without talking to anyone, so I closed the pull request -- I don't know if I even did that correctly or not)

I've been kind of upset about the whole issue -- I love Godot, but having dug into the source here and finding they threw out OS-specific, perfectly good, and more performant code in favor of black-box magic voodoo provided by the C++ standard library when Juan just started mentioning things needing fixed for AAA usage... There's some mixed signals here. Is the engine getting more performant or easier to write? You cannot have both.

RandomShaper commented 1 year ago

While it's sad that std::mutex is not backed by Windows' critical section objects, if you can prove actual performance issues as a result of that, please open a proposal so we re-evaluate the matter. In scripting you are not ayway expected to squeeze the processor at such a low level. Godot also has SpinLock internally, used where it matters.

Regarding the mechanics of threads (self-destruction), naming of methods, etc., you can as well open as proposal. Probably you're not alone in believing wait_to_finish() is a needlessly non-standard name.

Now, I believe @vnen's reply addresses this sufficiently as not a bug.

Jimmio92 commented 1 year ago

I'll write up a separate proposal for the Windows threading stuff; I shouldn't have mentioned it at all here just because it was related to threads... it seems to have distracted from the actual issue.

I personally feel the current Thread implementation can exhibit undefined behavior as it detaches a thread during the deconstruct. This seems fine for an unjoined thread, but a thread deconstructing from a join is already gone -- a detach at this point is likely in error. It also states you must call wait_to_finish, which is not true -- it detaches when deconstructing. These two things IMO are bugs and the reason for this report, a proposal for my changes, and (an attempt at a) pull request to at least partially fix it.

Though, I am now seeing that detached threads are technically supported as it currently stands -- if the destructor is called it detaches -- but I was under the impression it's best to call detach early, not right before/during destruct, though this may be fine and defined behavior (I'm not sure) and thus only half as big of a bug as I thought originally. Still, don't detach when not joinable even if detach doesn't get added to the script side (an if check in the destructor would solve it), though I really feel it should be up to the end-user/dev to call detach/join depending on their needs, not for detach to automagically get called behind the scenes.

Again, sorry for bringing the OS specific stuff up in this; that's not what this bug report is about -- warnings, errors, and lack of detach is all this is about.

RandomShaper commented 1 year ago

Detaching happens to try to avoid a crash as much as possible. If the thread has not been joined, the C++ runtime is free to crash the program. That's why the user is warned they should join (wait_to_finish()). Detaching is not done for supporting fire-and-forget thread dynamics in Godot (which, on the other hand, would require much more than that).