godotengine / webrtc-native

The official GDNative WebRTC implementation for non-html exports.
MIT License
198 stars 27 forks source link

Crash when using along other GDExtensions #127

Closed Sch1nken closed 6 months ago

Sch1nken commented 6 months ago

Godot version

4.2.stable.arch_linux

Plugin version

1.0.3

System information

Manjaro Laptop, AMD 4800H APU

Issue description

Heya, this is a short in the dark since the underlying issue is not fully understood yet (and there are other candidates for this issue, I'm currently investigating and just wanted to see if someone else has a similiar issue).

So since one or two days ago I've had this issue where both my game instances (here: server and client) can't establish a connection over webrtc but instead they silently crash (windows close, no error message). The only thing I get in the terminal is:

terminate called after throwing an instance of 'std::bad_cast'
  what():  std::bad_cast

which could be anything (or rather anywhere) since there is no stack-trace.

Everything worked fine only ~48hours ago. Coincidentally, I did a system upgrade around the time it stopped working (pretty sure some package has some issues). The current manjaro update DOES have some issues (a lot of people reporting issues in their forums), but for me it's been smooth sailing apart from this one Godot WebRTC problem.

Quick overview:

I am strongly suspecting an issue with a system package or something similiar, BUT every other single game or app is working fine. I've tried a bunch, no issues there. This is why I'm reaching out to check if someone else has similiar issues.

I'm using my own project (with two instances), but the provided minimal example show the same behaviour. I already tried redownloading the libs and replacing them in the project.

Other devices in my LAN work without issues (those are windows though). So I don't think it's some weird NAT issue or similiar.

Steps to reproduce

Minimal reproduction project

https://github.com/godotengine/godot-demo-projects/tree/master/networking/webrtc_signaling

Either with an external server or a local one ("in-game"). Doesn't really matter.

Sch1nken commented 6 months ago

I managed to get a backtrace using gdb.

terminate called after throwing an instance of 'std::bad_cast'
  what():  std::bad_cast

Thread 1 "game.x86_64" received signal SIGABRT, Aborted.
0x00007ffff7d2e83c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7d2e83c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff7cde668 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff7cc64b8 in abort () from /usr/lib/libc.so.6
#3  0x00007fffece2dba9 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#4  0x00007fffed32f4d6 in __cxxabiv1::__terminate(void (*)()) () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#5  0x00007fffed32f541 in std::terminate() () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#6  0x00007fffed32f694 in __cxa_throw () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#7  0x00007fffece2d9ca in __cxa_bad_cast () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#8  0x00007fffed3591cc in std::__cxx11::collate<char> const& std::use_facet<std::__cxx11::collate<char> >(std::locale const&) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#9  0x00007fffece4ea92 in std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>::_M_ready() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#10 0x00007fffece58d65 in void std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_insert_bracket_matcher<false, false>(bool) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#11 0x00007fffece5c804 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_bracket_expression() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#12 0x00007fffece5d068 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#13 0x00007fffece5d299 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#14 0x00007fffece5ccf3 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#15 0x00007fffece5d068 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#16 0x00007fffece5d299 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#17 0x00007fffece5ccf3 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#18 0x00007fffece5d068 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#19 0x00007fffece5cfe1 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#20 0x00007fffece5d299 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
--Type <RET> for more, q to quit, c to continue without paging--
#21 0x00007fffece5d9ca in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#22 0x00007fffece5deaf in std::enable_if<std::__detail::__is_contiguous_iter<char const*>::value, std::shared_ptr<std::__detail::_NFA<std::__cxx11::regex_traits<char> > const> >::type std::__detail::__compile_nfa<std::__cxx11::regex_traits<char>, char const*>(char const*, char const*, std::__cxx11::regex_traits<char>::locale_type const&, std::regex_constants::syntax_option_type) () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#23 0x00007fffece49ef0 in (anonymous namespace)::parse_url(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) [clone .constprop.0] () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#24 0x00007fffece4a5f5 in rtc::IceServer::IceServer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#25 0x00007fffece37217 in godot_webrtc::WebRTCLibPeerConnection::_parse_ice_server(rtc::Configuration&, godot::Dictionary) () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#26 0x00007fffece37af8 in godot_webrtc::WebRTCLibPeerConnection::_initialize(godot::Dictionary const&) () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#27 0x00007fffece3bba0 in godot::WebRTCPeerConnectionExtension::register_virtuals<godot_webrtc::WebRTCLibPeerConnection, godot::WebRTCPeerConnectionExtension>()::{lambda(void*, void const* const*, void*)#4}::_FUN(void*, void const* const*, void*) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#28 0x0000000000dd8394 in ?? ()
#29 0x0000000000dda600 in ?? ()
#30 0x000000000067744d in ?? ()
#31 0x00000000005699b1 in ?? ()
#32 0x0000000002b4e04c in ?? ()
#33 0x00000000029dc19f in ?? ()
#34 0x000000000067c28a in ?? ()
#35 0x00000000005699b1 in ?? ()
#36 0x0000000002b4e04c in ?? ()
#37 0x0000000002946d21 in ?? ()
#38 0x0000000002b85c91 in ?? ()
#39 0x000000000067c28a in ?? ()
#40 0x00000000005699b1 in ?? ()
#41 0x0000000002b4e04c in ?? ()
#42 0x00000000029dc19f in ?? ()
#43 0x000000000066c9a3 in ?? ()
#44 0x00000000005699b1 in ?? ()
#45 0x000000000106c37f in ?? ()
#46 0x0000000002b3bf50 in ?? ()
#47 0x0000000001081f54 in ?? ()
#48 0x000000000109ee8b in ?? ()
#49 0x000000000109f72f in ?? ()
#50 0x00000000004bb8a8 in ?? ()
#51 0x000000000041a713 in ?? ()
#52 0x00007ffff7cc7cd0 in ?? () from /usr/lib/libc.so.6
#53 0x00007ffff7cc7d8a in __libc_start_main () from /usr/lib/libc.so.6
#54 0x000000000042190a in ?? ()

Here is one I compiled myself

Thread 1 "game.x86_64" received signal SIGSEGV, Segmentation fault.
0x00007fffed356a74 in std::codecvt<char16_t, char, __mbstate_t>::do_unshift(__mbstate_t&, char*, char*, char*&) const ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
(gdb) bt
#0  0x00007fffed356a74 in std::codecvt<char16_t, char, __mbstate_t>::do_unshift(__mbstate_t&, char*, char*, char*&) const ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#1  0x00007fffed3c5d1b in std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<long>(long) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#2  0x00007fffece601fe in plog::TxtFormatterImpl<false>::format[abi:cxx11](plog::Record const&) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#3  0x00007fffece6129f in plog::ColorConsoleAppender<plog::TxtFormatter>::write(plog::Record const&) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#4  0x00007fffece888c5 in rtc::impl::Init::doInit() () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#5  0x00007fffece89a94 in rtc::impl::Init::token() () from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#6  0x00007fffece997a9 in rtc::impl::PeerConnection::PeerConnection(rtc::Configuration) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#7  0x00007fffece6362e in rtc::PeerConnection::PeerConnection(rtc::Configuration) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#8  0x00007fffece20fa1 in void std::_Construct<rtc::PeerConnection, rtc::Configuration&>(rtc::PeerConnection*, rtc::Configuration&) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#9  0x00007fffece1e09d in godot_webrtc::WebRTCLibPeerConnection::_create_pc(rtc::Configuration&) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#10 0x00007fffece1e3d5 in godot_webrtc::WebRTCLibPeerConnection::_initialize(godot::Dictionary const&) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#11 0x00007fffece1b4e7 in godot_webrtc::WebRTCLibPeerConnection::_init() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#12 0x00007fffece1bfcd in godot_webrtc::WebRTCLibPeerConnection::WebRTCLibPeerConnection() ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#13 0x00007fffece21f7d in godot_webrtc::WebRTCLibPeerConnection::create(void*) ()
   from /home/schinken/Dev/PropHuntLinux/libwebrtc_native.linux.template_debug.x86_64.so
#14 0x0000000000df9a10 in ?? ()
#15 0x000000000053290a in ?? ()
#16 0x00000000005744c1 in ?? ()
#17 0x000000000067744d in ?? ()
#18 0x00000000005699b1 in ?? ()
#19 0x0000000002b4e04c in ?? ()
#20 0x00000000029dc19f in ?? ()
#21 0x000000000067c28a in ?? ()
#22 0x00000000005699b1 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#23 0x0000000002b4e04c in ?? ()
#24 0x0000000002946d21 in ?? ()
#25 0x0000000002b85c91 in ?? ()
#26 0x000000000067c28a in ?? ()
#27 0x00000000005699b1 in ?? ()
#28 0x0000000002b4e04c in ?? ()
#29 0x00000000029dc19f in ?? ()
#30 0x000000000066c9a3 in ?? ()
#31 0x00000000005699b1 in ?? ()
#32 0x000000000106c37f in ?? ()
#33 0x0000000002b3bf50 in ?? ()
#34 0x0000000001081f54 in ?? ()
#35 0x000000000109ee8b in ?? ()
#36 0x000000000109f72f in ?? ()
#37 0x00000000004bb8a8 in ?? ()
#38 0x000000000041a713 in ?? ()
#39 0x00007ffff7cc7cd0 in ?? () from /usr/lib/libc.so.6
#40 0x00007ffff7cc7d8a in __libc_start_main () from /usr/lib/libc.so.6
#41 0x000000000042190a in ?? ()

Edit2: Maybe another Datapoint: I compiled libdatachannel myself and used the included server and client examples. These seem to work without issues...

I'll try again to compile webrtc-native and see if that changes anything. Maybe it's really a glibc related issue.

Na-r commented 6 months ago

I've also been running into this issue, on Arch. Compiling it myself hasn't fixed it, I still get a bad cast error whenever initializing a WebRTCPeerConnection. Have you had any other luck?

Sch1nken commented 6 months ago

Not yet, but it's good to know that other people have the same issue.

I'm planning on booting a live USB (maybe ubuntu, just to check another distro) and see what happens. I tried rolling back several packages (including the kernel itself). No luck. I did not try however to rollback glibc (didn't want to break stuff...). So maybe that is.

I'll investigate further. For now I am happy that it's not some freak issue on my end but might be related to something Arch.

Sch1nken commented 6 months ago

Just tested Manjaro KDE Plasma (23.0.4) from a live USB. Same symptoms, same crash.

Trying Ubuntu 22.04.3 now. Edit: Same crash.

Just for reference but I think we can rule that one out now: Ubuntu GLIBC 2.35 Manjaro GLIBC 2.38

So next layer up seems to be plog or libdatachannel...

I just don't understand why it would suddenly not work anymore...

Sch1nken commented 6 months ago

Okay, I made some progress.

Using Godot 4.1.3(.stable.offical) does NOT crash on Linux. So it has something to do with Godot itself. Either some GDExtension API (which according https://github.com/godotengine/webrtc-native/issues/126#issuecomment-1825560699 should not) or something inside Godot itself.

I'll try some beta builds and RCs to see when the problems started.

Edit: The story is a bit different now.

All official builds from godotengine.org work. 4.1.3 up until 4.2.1 stable. Inluding all Dev and Beta builds work without isues.

But, from 4.2 onwards, the Arch Build is labelled "4.1.3.stable.arch_linux", instead of "stable.official".

These .arch_linux builds, as well as my self-compiled version (on an arch based distro) crashes....

@Na-r can you try the official builds? You can download them from godotengine.org. Something seems to be up with the repositories version.

Edit2: I prematurely contacted the packager for the arch version. I totally forgot that the ubuntu version also had issues... and I downloaded that one (as well as the one I tested on live USB Manjaro) from godotengine.org...

So short recap, these work:

These do NOT work for me:

Na-r commented 6 months ago

Dang, I've only used official builds so far and have been getting the crashes. I'm going to try testing it on a self-compiled version now. I'll leave the backtrace of my latest attempt here, but it seems to be identical to what you were having on the arch repo versions.

Engine version: Godot Engine v4.2.1.stable.official (b09f793f564a6c95dc76acc654b390e68441bd01)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] /usr/lib/libc.so.6(+0x3e710) [0x7f0cfdcdb710] (??:0)
[2] std::__codecvt_utf16_base<char16_t>::do_unshift(__mbstate_t&, char*, char*, char*&) const (??:0)
[3] std::ostream& std::ostream::_M_insert<long>(long) (??:0)
[4] plog::TxtFormatterImpl<false>::format[abi:cxx11](plog::Record const&) (??:0)
[5] plog::ColorConsoleAppender<plog::TxtFormatter>::write(plog::Record const&) (??:0)
[6] rtc::impl::Init::doInit() (??:0)
[7] rtc::impl::Init::token() (??:0)
[8] rtc::impl::PeerConnection::PeerConnection(rtc::Configuration) (??:0)
[9] rtc::PeerConnection::PeerConnection(rtc::Configuration) (??:0)
[10] void std::_Construct<rtc::PeerConnection, rtc::Configuration&>(rtc::PeerConnection*, rtc::Configuration&) (??:0)
[11] godot_webrtc::WebRTCLibPeerConnection::_create_pc(rtc::Configuration&) (??:0)
[12] godot_webrtc::WebRTCLibPeerConnection::_initialize(godot::Dictionary const&) (??:0)

Edit: Still getting the same crash at /usr/lib/libc.so.6(+0x3e710) with a self-compiled master build. Also tried compiling with llvm and lld, but am still hitting the same error. Edit 2: Also tried the same test with commit b09f793f564a6c95dc76acc654b390e68441bd01, the commit for v4.2.1 stable; same crash.

Sch1nken commented 6 months ago

So it's not that... just to check: What Kernel are you on? What GLIBC Version? (ldd --version ldd for glibc, uname -r for kernel).

As stated above, GLIBC is 2.38 and Kernel is 6.6.7-4 (though I tried others before, there has been a patch yesterday(?) or a a few days ago).

Edit: Come to think of it: While trying 4.1 I had to downgrade the project (and then upgrade when testing 4.2 versions again). Maybe that changed something? My git shows no real change in the project.godot file however...

Edit2: Resetting my project (git reset --hard) shows the crashing again... so something changed while trying older versions...

Sch1nken commented 6 months ago

@Na-r wild guess... but what other GDExtensions are you using? It seems removing WWise (https://github.com/alessandrofama/wwise-godot-integration) fixes the crash for me

This doesn't make any sense since the demo project also crashed for me... I don't see any real pattern here

Edit: Especially since it worked before... but removing the WWise addon gets the WebRTC Connection going again...

What's especially weird, pasting in the addon but not enabling it, still leads to a crash... Feels like a Godot GDExtension Problem

Edit: More insight thanks to a discussion on the german godot discord (thanks whiteshampoo). In my special instance, the wwise plugin is disabled but it automatically adds back an autoload to handle wwise related things to the project.

This in and of itself is weird and unexpected, but further I don't understand how that would cause WebRTC do crash...

Na-r commented 6 months ago

GLIBC: ldd (GNU libc) 2.38 Kernel: 6.6.7-arch1-1

Whelp, that seems to be it. I'm using a VOIP GDExtension I wrote, and disabling it makes the WebRTC not fail anymore. This definitely seems to be the issue. It looks like it's a core GDExtension bug though, since there really shouldn't be crosstalk between these extensions from what I understand.

Sch1nken commented 6 months ago

Yes. I'm currently debugging an issue with the wwise plugin where it gets called even when disabled... I wonder if this and the general issue here are related to the somewhat new hot-reload feature...

Edit: I believe I got my issue (WWise Autoload being created) figured out.

But for the issue at hand (multiple GDExtensions having problems), I still want to create a test-case. It seems that either some pointer or reference gets lost and upon first accessing the GDExtension provided methods the game just crashes.

I wonder if it has to do with the order of GDExtension (i.e. if webrtc is loaded before voip or after (or wwise respectively)).

But I don't think it makes a difference. It's weird that wwise works without issues though, so maybe just the webrtc extension has issues...

Sch1nken commented 6 months ago

Confirmed an issue even when using a minimal GDExtension (using the official tutorial). WebRTC connection crashes the game...

So it seems to be an issue with Linux + multiple GDExtensions.

Sch1nken commented 6 months ago

A new insight when only using the WebRTC-Native GDExtension:

it works when using my own compiled version of godot (4.2.1-stable tag) or the official godotengine.org build.

It does NOT work when using the arch repository provided build...

I'm starting to wonder if its actually a Godot Core GDExtension issue or something only affecting the webrtc native gdextension

Faless commented 6 months ago

I have confirmed the crash, using the released WebRTC plugin and the test extension in godot-cpp.

One thing I noticed is that if I change the load order in .godot/extension_list.cfg from:

res://example.gdextension
res://webrtc/webrtc.gdextension

to:

res://webrtc/webrtc.gdextension
res://example.gdextension

i.e. I make the the WebRTC extension load first.

Then it seems to work fine.

I've noticed the same thing happens in Godot 4.1.1.

Faless commented 6 months ago

I've traced it to this:

diff --git a/SConstruct b/SConstruct
index 6e9e094..80febb0 100644
--- a/SConstruct
+++ b/SConstruct
@@ -126,7 +126,7 @@ if env["platform"] == "linux":
         LINKFLAGS=[
             "-Wl,--no-undefined",
             "-static-libgcc",
-            "-static-libstdc++",
+            #"-static-libstdc++",
         ]
     )
     # And add some linux dependencies.

i.e. the -static-libstdc++ link flag used in the WebRTC plugin (only on Linux).

Sch1nken commented 6 months ago

Interesting, changing the order might be the reason why it first worked for me (WebRTC + Wwise) but suddenly (after some git push/pull using a a windows client) the order might have changed, resulting in the issue.

Faless commented 6 months ago

Attaching a MRP that shows the crash (includes the webrtc library and the godot-cpp library).

crash.zip

Sch1nken commented 6 months ago

Do you think it is an Godot Core issue or an WebRTC GDExtension issue?

From what I can see this is where the GDExtension gets loaded: https://github.com/godotengine/godot/blob/9d1cbab1c432b6f1d66ec939445bec68b6af519e/core/extension/gdextension.cpp#L717

And here is the corresponding implementation: https://github.com/godotengine/godot/blob/9d1cbab1c432b6f1d66ec939445bec68b6af519e/drivers/unix/os_unix.cpp#L640

On the top of this file are some definitions on how to load these libraries: https://github.com/godotengine/godot/blob/9d1cbab1c432b6f1d66ec939445bec68b6af519e/drivers/unix/os_unix.cpp#L89

And according to this post on StackOverflow depending on the settings used some functions could override others? https://stackoverflow.com/a/70967204

I'll try to compile godot with some other settings later today, to see if that changes anything.

mihe commented 6 months ago

I mentioned this to @Faless on Rocket Chat already, but this is likely the same issue as godot-jolt/godot-jolt#373.

Basically, compiling with -static-libstdc++ doesn't change the fact that the standard library symbols are still explicitly exported, so your shared library now exports standard library symbols, which for some weird reason end up getting used by other shared libraries, presumably because Godot itself also compiles with -static-libstdc++ and thus doesn't load the shared library version of the standard library.

The way I resolved this was by providing LD with a version script, through the -Wl,--version-script flag, which lets you explicitly specify which symbols are to be exported, which in the case of this extension would look something like this:

{
    global:
        webrtc_extension_init;
    local:
        *;
};
Sch1nken commented 6 months ago

So if I understand that correctly, the RTLD_DEEPBIND dlopen flag could solve the problem?

RTLD_DEEPBIND (since glibc 2.3.4) Place the lookup scope of the symbols in this library ahead of the global scope. This means that a self-contained library will use its own symbols in preference to global symbols with the same name contained in libraries that have already been loaded. This flag is not specified in POSIX.1-2001.

Edit: Or is this the reverse situtation?

mihe commented 6 months ago

So if I understand that correctly, the RTLD_DEEPBIND dlopen flag could solve the problem?

Without having actually tried it, I get the impression that this wouldn't fix the problem, since it's not really an ordering problem and more of a problem with the symbols being available to other libraries in the first place.

With that said, when looking at the other dlopen flags, these two look very interesting:

RTLD_GLOBAL The symbols defined by this library will be made available for symbol resolution of subsequently loaded libraries.

RTLD_LOCAL This is the converse of RTLD_GLOBAL, and the default if neither flag is specified. Symbols defined in this library are not made available to resolve references in subsequently loaded libraries.

It sounds like adding RTLD_LOCAL to GODOT_DLOPEN_MODE might solve this as well. This of course assumes that no library loaded by Godot ever needs to resolve references in other libraries, which might be a dangerous assumption.

EDIT: Maybe RTLD_LOCAL is already implicitly set actually? I'm struggling to interpret whether the default applies even when passing in other flags.

Sch1nken commented 6 months ago

For me, but that is just a feeling, the "neither" corresponds to RTLD_LOCAL and RTLD_GLOBAL. So I think RTLD_LOCAL is already specified (even if other, non-related flags are set).

Again, later today I'll try to build godot with different flags to see what happens.

Sch1nken commented 6 months ago

Okay, I tried several variations of RTLD_DEEPBIND, RTLD_GLOBAL, RTLD_LOCAL etc.

The default (on my custom Godot build) seems to be 0xA (RTLD_LOCAL | RTLD_DEEPBIND | RTLD_NOW). Which seems reasonable and what I would think would be the correct settings.

Apart from only exporting the entry_symbol of the gdextension I currently don't see a way (from the Godot side of things) to prevent this issue.

I also thought about just loading the entry_symbol and ignoring all other symbols from the library, but there doesn't seem to be something like this. It's either all exported symbols or nothing. Correct me if I am wrong, not super experienced with this stuff.

Sch1nken commented 6 months ago

Just dumping some more interesting findings:

A lenghty explanation of what happens when linking libraries with same symbols: https://stackoverflow.com/questions/22004131/is-there-symbol-conflict-when-loading-two-shared-libraries-with-a-same-symbol

A possible solution is also provided: https://stackoverflow.com/a/70673863 (linking to https://gcc.gnu.org/wiki/Visibility)

But this is of course cumbersome: this is why -fvisibility was added. With -fvisibility=hidden, you are telling GCC that every declaration not explicitly marked with a visibility attribute has a hidden visibility. And like in the example above, even for classes marked as visible (exported from the DSO), you may still want to mark e.g. private members as hidden, so that optimal code will be produced when calling them (from within the DSO).

Of course assuming this is the root problem here and I am not too deep into trying to fix stuff in the wrong place :)

h0lley commented 6 months ago

Running into this issue as well with WebRTC + GodotSteam (see the mention just above this comment).

Somewhat interestingly, I am only getting the crash when initializing the WebRTCPeerConnection with a stun server.

Changing the load order as per Faless' comment did also prevent the crash. It's a good workaround, albeit in the long run users shouldn't need to manually touch .godot.

mihe commented 6 months ago

A possible solution is also provided: stackoverflow.com/a/70673863 (linking to gcc.gnu.org/wiki/Visibility)

@Sch1nken As noted in your quote there, -fvisibility=hidden will only affect symbols that aren't explicitly marked with a visibility attribute, which unfortunately the problematic standard library symbols are.

I believe @Faless has already verified that the version script method that I used for Godot Jolt fixes the issue for this extension as well, so that's at least one confirmed fix. It would be nice if there was some way of fixing this from the Godot side of things though.

Faless commented 6 months ago

@Sch1nken , indeed, I tried both with -fvisibility=hidden and RTLD_LOCAL (RTLD_DEEPBIND is already set in 4.2), neither fixed the crash.

The only thing fixing it seems to be what @mihe suggests (i.e. using --version-script).

I've opened #131 but I'd like to test it a bit more.

Faless commented 6 months ago

Reopening until we release the updated version.

Faless commented 6 months ago

I've released v1.0.4, and updated the asset library entries (which will need some time to be approved).

Big thank you to everyone involved in this discussion, especially @Sch1nken for the amazing debugging effort, and @mihe for suggesting the final fix! :blue_heart: :partying_face:

Closing as fixed.

Faless commented 6 months ago

Note that after releasing v1.0.4 I realized that official windows builds are made with MinGW which uses GCC and needs the same extra flags to avoid crashes (I had only tested MSVC builds on my end before releasing v1.0.4 :crying_cat_face: ).

I've now released v1.0.5, and updated (again) the asset library entries (which will need some time to be approved).

mihe commented 6 months ago

official windows builds are made with MinGW which uses GCC and needs the same extra flags to avoid crashes

Have you actually verified that this crash occurs on Windows? I was under the impression that this problem had to do with Linux' dynamic linker rather than ld itself.

Faless commented 6 months ago

Have you actually verified that this crash occurs on Windows? I was under the impression that this problem had to do with Linux' dynamic linker rather than ld itself.

That's a very good point, I was lazy :sweat_smile: , and I only verified the crash from Linux using wine + the windows binaries (which I figure, may still be enough to motivate the change).

I will check on native Windows too.

Sch1nken commented 6 months ago

Iirc native windows was fine. I temporarily moved to a windows machine to continue development when I first had the issue.