godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.24k stars 21.04k forks source link

Unix based crash handlers are not signal-safe. #82418

Open naelstrof opened 1 year ago

naelstrof commented 1 year ago

Godot version

4.1

System information

Arch linux, AMD Ryzen 7 5800X3D x86_64, Radeon RX 6950 XT

Issue description

Causing a crash on Linux doesn't give you a workable stack trace due to the handler not being signal-safe.

When Godot catches a fatal signal, the signal handler sometimes generates more fatal signals causing a variety of bad things to happen. In general the user will never receive a stack trace, and has the opportunity to lock up. (With me of course being unfortunate enough to have a captured, invisible mouse during the crash.)

Here's a stack trace of a regular unrelated crash, where the signal handler immediately generates another fatal signal.

///////// SECOND SIGSEGV GENERATED HERE, POINTER TO DISOWNED MEMORY ////////////
OS_Unix::execute(String const&, List<String, DefaultAllocator> const&, String*, int*, bool, MutexImpl<std::recursive_mutex>*, bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/drivers/unix/os_unix.cpp:480)
handle_crash(int) (/home/naelstrof/projects/godot-mono/src/godot-mono/platform/linuxbsd/crash_handler_linuxbsd.cpp:104)
___lldb_unnamed_symbol15378 (@___lldb_unnamed_symbol15378:35)
___lldb_unnamed_symbol15368 (@___lldb_unnamed_symbol15368:114)
___lldb_unnamed_symbol3300 (@___lldb_unnamed_symbol3300:4)
///////// FIRST SIGSEGV GENERATED HERE, INVALID DYNAMIC CAST //////////// 
__cxxabiv1::__dynamic_cast(const void *, const __cxxabiv1::__class_type_info *, const __cxxabiv1::__class_type_info *, ptrdiff_t) (@__dynamic_cast:15)
RefCounted* Object::cast_to<RefCounted>(Object*) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/object.h:774)
CSharpLanguage::_instance_binding_reference_callback(void*, void*, unsigned char) (/home/naelstrof/projects/godot-mono/src/godot-mono/modules/mono/csharp_script.cpp:1370)
Object::_instance_binding_reference(bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/object.h:670)
RefCounted::unreference() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.cpp:89)
Ref<InputEvent>::unref() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.h:209)
Ref<InputEvent>::~Ref() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.h:222)
HashSet<Ref<InputEvent>, HashMapHasherDefault, HashMapComparatorDefault<Ref<InputEvent>>>::clear() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/templates/hash_set.h:248)
HashSet<Ref<InputEvent>, HashMapHasherDefault, HashMapComparatorDefault<Ref<InputEvent>>>::~HashSet() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/templates/hash_set.h:465)
Input::~Input() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/input/input.cpp:1623)
void memdelete<Input>(Input*) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/os/memory.h:109)
Main::cleanup(bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/main/main.cpp:3739)
main (/home/naelstrof/projects/godot-mono/src/godot-mono/platform/linuxbsd/godot_linuxbsd.cpp:76)
___lldb_unnamed_symbol3190 (@___lldb_unnamed_symbol3190:29)
__libc_start_main (@__libc_start_main:44)

The POSIX standard on signals lists some pretty extreme restrictions on signal handlers for processes that are "multi-threaded". It's extraordinarily dense to read and understand in context with Godot though.

This stack overflow thread lists a table of stack trace generators, many of them not being signal-safe.

So I'm making an assumption that stack trace generation currently isn't safe-- and that it could be fixed by using an appropriate signal-safe stack trace generator.

I haven't investigated the code path that tries to give gdscript the ability to handle the crash yet. Given how simply accessing OS's execute command causes an unmanaged memory error makes me feel like the crash handler for linux needs to be re-accessed.

Steps to reproduce

Cause a crash in the Linux editor. The implementation of signals on your system may give you different results, though in general you shouldn't get a workable stack-trace.

A minimal project has been included below that crashes on boot.

Minimal reproduction project

CrashDummy.zip

naelstrof commented 1 year ago

There's a problem with my report, I realized my CrashDummy test project isn't demonstrating the issue like I thought it was-- As it doesn't crash during cleanup like my other project. It's actually working great!

It'll take me more time to figure out what's going on. I'll update the issue as I learn more.