Open Cykyrios opened 1 year ago
I haven't been able to reproduce this so far on Fedora 37 KDE (GeForce RTX 4090 with NVIDIA 525.89.02).
What graphics card model, driver version and desktop environment are you using?
Edit: As of November 2023, I've started to be able to reproduce this issue on the same setup as mentioned above (with Fedora 38 and then 39).
Oh right, I forgot about GPU-related info. I have an AMD 7900 XT, running on the open-source amdgpu drivers with Mesa 22.3.5 (amdgpu version is "kernel"). The desktop is Plasma 5.26.5.
Saw this happening (only once) on a totally different config: RTX 2080Ti archlinux i3wm
I have this too, again. Seems very reminiscent of #69352 Happens quite frequently here. The message varies a bit, last one I got is:
[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot: xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
Godot: v4.0.1.stable.arch_linux libx11: 1.8.4-1 arch linux 64 bit, kernel 6.2.8-zen1 On a laptop with Intel HD Graphics 620
Manjaro kernel version: 6.1.22-1 As for GPUs: AMD RX 6800 XT but also Intel UHD Graphics 770 (CPU i7-12700) I have my screens connected to integrated graphics as I'm doing some GPU passthrough, this setup used to cause issues during the beta whenever I opened a new window or a submenu. Happened like 4 times mostly randomly when the editor was idling.
Managed to reproduce it when connected to gdb, adding backtrace as the attachment gdb.txt
Forgot to mention Godot version: custom build based on 4.0.2 stable
Seeing the same thing on Manjaro here, I have integrated Intel graphics (Intel i7-1165G7). Gnome on Wayland. Common factor seems to be Arch based distros?
Full backtrace:
handle_crash: Program crashed with signal 11
Engine version: Godot Engine v4.0.2.stable.mono.official (7a0977ce2c558fe6219f0a14f8bd4d05aea8f019)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so(+0x4a90a4) [0x7f5d0658b0a4] (??:0)
[2] /usr/lib/libc.so.6(+0x38f50) [0x7f5d35225f50] (??:0)
[3] /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so(+0x49296b) [0x7f5d0657496b] (??:0)
[4] /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so(+0x4a8d18) [0x7f5d0658ad18] (??:0)
[5] /usr/lib/libc.so.6(+0x38f50) [0x7f5d35225f50] (??:0)
[6] /usr/lib/libc.so.6(+0x878ec) [0x7f5d352748ec] (??:0)
[7] /usr/lib/libc.so.6(gsignal+0x18) [0x7f5d35225ea8] (??:0)
[8] /usr/lib/libc.so.6(abort+0xd7) [0x7f5d3520f53d] (??:0)
[9] /usr/lib/libc.so.6(+0x2245c) [0x7f5d3520f45c] (??:0)
[10] /usr/lib/libc.so.6(+0x319f6) [0x7f5d3521e9f6] (??:0)
[11] /usr/lib/libX11.so.6(+0x3eb8f) [0x7f5d2d888b8f] (??:0)
[12] /usr/lib/libX11.so.6(+0x41995) [0x7f5d2d88b995] (??:0)
[13] /usr/lib/libX11.so.6(_XEventsQueued+0x62) [0x7f5d2d88e642] (??:0)
[14] /usr/lib/libX11.so.6(XFlush+0x1f) [0x7f5d2d86bc1f] (??:0)
[15] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0x4d62051] (??:0)
[16] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0xe792eb] (??:0)
[17] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0x4217f35] (??:0)
[18] /opt/godot-mono-bin/godot/Godot_v4.0.2-stable_mono_linux.x86_64() [0x4e38160] (??:0)
[19] /usr/lib/libc.so.6(+0x85bb5) [0x7f5d35272bb5] (??:0)
[20] /usr/lib/libc.so.6(+0x107d90) [0x7f5d352f4d90] (??:0)
-- END OF BACKTRACE --
Having the same issue, Manjaro with Hyprland / Wayland here. Also Godot 4.0.1 stable, libx11 v1.8.4-1, intel integrated graphics.
Can confirm on Manjaro with kernel 6.1.23, X11 (no wayland) with libx11 1.8.4-1 as well, intel integrated, godot 4.1 compiled from source (from a fork not far from master, but judging from this report the issue is in godot, I could confirm if necessary), backtrace is exactly the same as @Eraph above.
Also, I noticed this is with .NET 7.0.3, while if I download the official stable godot mono from godotengine.org (not from Manjaro's pacman) it never crashes this way, and it's on .NET6, not sure if it matters.
Any other info I could provide to help debug this?
Same issue.
godot: 4.0.2.stable.official.7a0977ce2
render: Vulkan API 1.3.230 - Forward Mobile - Using Vulkan Device #0: Intel - Intel(R) HD Graphics 620 (KBL GT2)
os: Gentoo
kernel: 6.1.22
de: Xfce 4.18 / X11
libX11: 1.8.4-r1
It seems that the problem no longer occurs in version 1.8.5 (Arch Linux official extra repository).
UPD. The problem appears again on libX11 1.8.7
Just had this happen
swaywm Arch Linux AMD 6700XT Godot v4.2.dev.custom_build [f8dbed4d0] libx11 1.8.6-1
[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot.linuxbsd.editor.x86_64: xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
I'm getting the same error, with a recent libX11 and non-Arch Linux
Godot Engine v4.1.1.stable.custom_build OpenGL API 4.6 (Core Profile) Mesa 23.1.3 - Compatibility - Using Device: AMD - AMD Radeon RX 6600 (navi23, LLVM 15.0.7, DRM 3.52, 6.3.13_1)
Void Linux XFCE4 / xfwm 4.18.0_1 libX11 1.8.6_1 libxcb 1.16_1
[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot: xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
Confirmed on Fedora running KDE with Mesa Intel® Xe Graphics. It would seem there's a regression in libX11 1.8.7, probably related to https://gitlab.freedesktop.org/xorg/lib/libx11/-/issues/170 Downgrading libX11 to 1.8.4 removes the issue.
Can also confirm that a downgrade to libX11 1.8.4 fixed the issue. I already thought that godot is somewhat unstable but now even 4.2 beta1 works like a charm :-)
I am having similar crashes involving xcb_in.c
. They are unpredictable, sometimes crashing the project, sometimes crashing the editor, sometime just displaying in logs without a crash.
Ubuntu 23.04
Godot 4.2.dev4.official.549fcce5f
libx11 1.8.4-2ubuntu0.3
libxcb 1.15-1
The error messages are as follow:
[xcb] Unknown request in queue while dequeuing [xcb] You called XInitThreads, this is not your fault [xcb] Aborting, sorry about that. godot: ../../src/xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
or
[xcb] Unknown sequence number while awaiting reply [xcb] You called XInitThreads, this is not your fault [xcb] Aborting, sorry about that. godot: ../../src/xcb_io.c:374: poll_for_response: Assertion `!xcb_xlib_threads_sequence_lost' failed.
or
godot: ../../src/xcb_in.c:757: xcb_request_check: Assertion `!reply' failed.
It also sometimes crashes without an error message.
Keeps happening here all the time, it makes editor basically unusable due to how often it happens. It's really frustrating to the point of me not wanting to work on my project anymore.
[xcb] Unknown sequence number while awaiting reply [xcb] You called XInitThreads, this is not your fault [xcb] Aborting, sorry about that. 4.1.3.x86_64: xcb_io.c:374: poll_for_response: Assertion
!xcb_xlib_threads_sequence_lost' failed.
`
On Ubuntu 23.04, all of Godot 4.1.1, 4.1.2, 4.1.3 crash about three times per hour. If I downgrade xserver-xorg-core from 2:21.1.7-1ubuntu3.1 to 2:21.1.7-1ubuntu3, then the crashes happen only once every few days.
Edit: fixed the version numbers
On Ubuntu 23.04, all of Godot 4.1.1, 4.1.2, 4.1.3 crash about three times per hour. If I downgrade xserver-xorg-core to 2:21.1.7-1ubuntu3.1, then the crashes happen only once every few days.
Single window mode also helps alot.
This is becoming more critical.
Fedora 39 does not have an option to downgrade libX11 from libX11-1.8.7-1. That means Godot in any form becomes unusable in Fedora 39 and other current distros.
Please report the issue to https://gitlab.freedesktop.org/xorg/lib/libx11, it should not regress and break existing applications.
Even if we could find a workaround in Godot to not trigger whatever makes libx11 fail, all existing Godot releases and published games would still be broken. So libx11 needs to stop breaking Godot every other patch release.
Please report the issue to https://gitlab.freedesktop.org/xorg/lib/libx11, it should not regress and break existing applications.
Reported: https://gitlab.freedesktop.org/xorg/lib/libx11/-/issues/199 I hope we can get a fix before Fedora 40.
Forgive my ignorance here, but given libx11
's instability as of recently, would it make sense to vendor and statically link a properly patched version of libx11
in godot? Basically, if upstream cannot fix this (and/or keep it from regressing), can Godot take matters into its own hands? It won't fix existing released games, but it might fix new ones, and provide an easy way for devs to re-release a game with the fix. Just my 2c, not sure if this makes sense.
Yep, now I have a reason not to feel bad about catching myself subconsciously spamming CTRL+S.
Milestone shows 4.3, would the fix last or do you guys think an update gonna break it again soon after? Might just downgrade libx11 if it gets really annoying but I'm kinda too busy to troubleshoot stuff rn if doing so happens to break any of my other packages. ;-;
@Lamby777 Upgrading libx11 works, too. I compiled and installed the latest libx11 from master on fedora 39 by doing
https://gitlab.freedesktop.org/xorg/lib/libx11/-/tree/master
./autogen.sh
./configure --prefix=/usr
make
sudo make install
and then reboot and I haven't had a crash yet.
Gonna share I'm experiencing this regularly on my Manjaro KDE machine. System details here:
Operating System: Manjaro Linux
KDE Plasma Version: 5.27.10
KDE Frameworks Version: 5.113.0
Qt Version: 5.15.11
Kernel Version: 6.5.13-7-MANJARO (64-bit)
Graphics Platform: X11
Processors: 4 × AMD Athlon(tm) X4 880K Quad Core Processor
Memory: 15.6 GiB of RAM
Graphics Processor: NVIDIA GeForce GTX 1070/PCIe/SSE2
Manufacturer: Gigabyte Technology Co., Ltd.
Currently working with Godot 4.2-Stable running it from the command line so I get more interesting details.
Error from terminal:
[xcb] Unknown sequence number while processing queue
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
Godot_v4.2-stable_linux.x86_64: xcb_io.c:278: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
zsh: IOT instruction (core dumped) $GODOT4_BIN -e .
System info...
Operating System: Manjaro Linux
KDE Plasma Version: 5.27.10
KDE Frameworks Version: 5.113.0
Qt Version: 5.15.11
Kernel Version: 6.1.69-1-MANJARO (64-bit)
Graphics Platform: X11
Processors: 12 × 12th Gen Intel® Core™ i5-12500T
Memory: 7.5 GiB of RAM
Graphics Processor: Mesa Intel® UHD Graphics 770
Manufacturer: Dell Inc.
Product Name: OptiPlex 3000
Error in terminal...
Godot Engine v4.2.1.stable.arch_linux - https://godotengine.org
[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
godot: xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
[1] 2223 IOT instruction (core dumped) godot --editor --verbose --single-window
@Lamby777 Upgrading libx11 works, too. I compiled and installed the latest libx11 from master on fedora 39 by doing
https://gitlab.freedesktop.org/xorg/lib/libx11/-/tree/master
./autogen.sh
./configure --prefix=/usr
make
sudo make install
and then reboot and I haven't had a crash yet.
That's weird, as the latest master
commit of libx11 is the 1.8.7 release which is shipped by Fedora.
So there's a difference between the Fedora package for libX11-1.8.7-1.fc39 and the one you compiled locally. It can be a different set of install dependencies so that your local builds adds or removes a feature, or this patch that Fedora is chugging along https://src.fedoraproject.org/rpms/libX11/blob/rawhide/f/dont-forward-keycode-0.patch, or any of the other custom tweaks in their .spec file, though I don't see much that sounds relevant: https://src.fedoraproject.org/rpms/libX11/blob/rawhide/f/libX11.spec
Either way, I suggest also opening a Fedora bug report, as the upstream libX11 report isn't getting any traction and we now have evidence that a self-compiled libx11 performs differently.
Either way, I suggest also opening a Fedora bug report, as the upstream libX11 report isn't getting any traction and we now have evidence that a self-compiled libx11 performs differently.
Possible CFLAGS are involved and some UB is causing this? aka when you built libX11 manually from master I'd assume you didn't use the same (possibly just a plain -O2). I don't really keep up with fedora, but believe they use LTO nowadays for one? (Edit: as for autoconf options, they don't pass anything notable I can see, the keycode patch sounds harmless too -- not that I looked too closely)
Haven't run into crashes with 1.8.7 myself on Gentoo, albeit I may not use it enough to run into these (I just test godot a bit for packaging, haven't got bug reports either way).
@ionenwks That's a good call. Here are the build flags used on Fedora (as of Fedora 38 on my VM, but it's likely similar on F39).
$ rpm --eval %build_cflags
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
$ rpm --eval %build_cxxflags
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
$ rpm --eval %build_ldflags
-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1
@ionenwks
Interestingly, when I run that locally on f39, I see that there are some flags with multiple spaces between them. Note -m64
in build_cflags
having a lot of extra whitespace surrounding it. I assume that's fine for the compiler, but maybe it's not?
$ rpm --eval %build_cflags
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
$ rpm --eval %build_cxxflags
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
$ rpm --eval %build_ldflags
-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1
@granitrocky That's not a problem for the compiler, and that's expected. It's just an artifact of Fedora's RPM macro usage, some of which might be empty. Something like %{?conditional_macro} %{?conditional_macro2} %{?conditional_macro3}
can definitely lead to two consecutive empty spaces if conditional_macro2
is empty.
Anyway, that's just a nerdy RPM packager tangent, so I'll mark those two comments as off-topic ;)
Just another confirmation: After having crashes every 10 minutes, i'm using godot editor now for almost 4 hours without any issues after compiling libx11 by myself.
(FWIW, this isn't RedHat-specific: I just got it with Debian, using libx11 1.8.7-1. FWIW I'm running Wayland.)
The code in xcb_io.c
looks like some hairy thing that's trying hard to be thread-safe. But if it's not successful, then this is the sort of error one might expect to see. And certainly different compilation options could affect how aggressively stuff gets reordered, which could affect thread safety. So I thought, why doesn't Godot just put a mutex around the call? And then I looked, and it already had -- in most cases.
But consider the following rule (from xcb_io.c
):
- A single thread cannot be both the the event-reading and the
- reply-reading thread at the same time.
So we would expect the call to XQueryTree
(which, inside libX11, calls _XReply
, which seems to be "reply-reading") to also lock the same mutex -- but it doesn't.
So it seems possible that there's a race there. XGetWindowProperty
is another possible culprit (the ones in screen_get_usable_rect
only) . And XGetInputFocus. I haven't fully audited to see if there are other cases than these three.
I have only read the code, so this could be totally bogus. But it seems plausible.
@Lamby777 Upgrading libx11 works, too. I compiled and installed the latest libx11 from master on fedora 39 by doing
https://gitlab.freedesktop.org/xorg/lib/libx11/-/tree/master
./autogen.sh
./configure --prefix=/usr
make
sudo make install
and then reboot and I haven't had a crash yet.
it's been annoying me so much that i finally decided to go looking for this thread again... :P
sadly, it doesn't work :(
Not that compiling your own version doesn't work; that I don't know. The actual compiling part doesn't work. I tried ./autogen.sh
and it was giving some error about xorg macros not being installed so i installed this package called xorg-util-macros
and now it's complaining about some macro XTRANS_CONNECTION_FLAGS
being possibly undefined... Is the macro package I installed just outdated? Cuz i just pulled libx11 source from master so maybe they changed some macros that haven't been put onto arch repos yet. Is that even the right package to install? Seems to be, since the error's gone, but idk
At least the error message apologizes, which I found somewhat amusing.
I'll add another data point I guess, got the same crash on two different machines: Godot 4.2.1, Fedora 39 (Gnome 45 on Wayland), libX11 1.8.7, i3-12100F, 5700xt, (Running godot under xwayland) Godot 4.2.1, Arch (Gnome 46 on Xorg), libX11 1.8.9, i7-1185G7, Iris Xe, (Running godot under xorg)
Happened on both machines at around twice per hour. Wasn't able to pick up on anything specifically that caused them. I'll report back if I compile libX11 from source and that fixes anything.
Don't know if it's useful, but here is a new repro:
[xcb] Unknown request in queue while dequeuing [xcb] You called XInitThreads, this is not your fault [xcb] Aborting, sorry about that. TheGuild: ../../src/xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed. [1] 631819 segmentation fault ./Path/To/Exe
Debian 12 Gnome 43.9 X.Org version: 1.22.1.9
If any other info is needed, I can edit this post.
That version is almost five years old. Is it possible that we're looking at some ABI incompatibility here?
That version is almost five years old. Is it possible that we're looking at some ABI incompatibility here?
That's a good question.
This hypothesis could be tested by someone who can reproduce the issue reliably, by making a custom build with scons use_sowrap=no
, which will disable the dynamic library wrappers and link the system libraries instead. To compile successfully, you might need to install more dev libraries (the ones from https://docs.godotengine.org/en/latest/contributing/development/compiling/compiling_for_linuxbsd.html#distro-specific-one-liners, which wasn't updated now that we default to dlopen'ing these deps).
Before I read your comment, I tried another track: I replaced the vendored Xlib.h
, XKBlib.h
and Xutil.h
by the ones from my system (Arch Linux, libx11 1.8.9), and re-ran the generator (version cb59cc4fc69a3f05aed6ca6fa998a934788794f4, which is the first one marked as "0.3" in the source) as instructed in the header. The differences are only additions and one replacement of a char*
argument by const char*
in XkbOpenDisplay
. It still crashes.
I can reproduce it fairly reliably at the moment: my game usually crashes within tens of seconds. The editor fares better, but also crashes about once an hour or so.
With use_sowrap=no
, it initially seemed a bit better, but after a few minutes it also crashed.
For the record, here are the commands I used to build (it gets simpler without mono):
$ git checkout 4.2.2-stable
$ scons platform=linuxbsd target=editor arch=x86_64 module_mono_enabled=yes use_sowrap=no
$ bin/godot.linuxbsd.editor.x86_64.mono --headless --generate-mono-glue modules/mono/glue
$ ./modules/mono/build_scripts/build_assemblies.py --godot-output-dir=./bin
Summarizing the reports above:
The only difference between 1.8.5 and 1.8.6 is 304a654, which seems unrelated to me. So I'm inclined to assume that there was only one breakage, not two, and 1.8.5 is broken as well.
I tried rebuilding the Arch package from the official PKGBUILD. Even with this, I could not trigger the crash! So for Arch users, this is a local workaround. After reinstalling the official binary package, I got a crash within a minute or two.
The two libX11.so.6.4.0
files are indeed different, but I can't tell if the differences are meaningful. Addresses and orders are different, but the list of exported symbols is the same. The two libX11-xcb.so.1.0.0
files are the same size (13976 bytes), and the diff is small:
--- official-xcb.hex 2024-05-29 12:23:39.711349095 +0200
+++ mine-xcb.hex 2024-05-29 12:23:45.944791641 +0200
@@ -45,8 +45,8 @@
000002c0: 0300 0000 0000 0000 0100 01c0 0400 0000 ................
000002d0: 0100 0000 0000 0000 0200 01c0 0400 0000 ................
000002e0: 0000 0000 0000 0000 0400 0000 1400 0000 ................
-000002f0: 0300 0000 474e 5500 cf41 1b11 8b82 2d5d ....GNU..A....-]
-00000300: ad43 7b2d 9bad ab79 884a bc70 0000 0000 .C{-...y.J.p....
+000002f0: 0300 0000 474e 5500 7e7a c198 eaf0 26ab ....GNU.~z....&.
+00000300: 1636 87ec 7be5 6f04 a5b9 943b 0000 0000 .6..{.o....;....
00000310: 0200 0000 0500 0000 0100 0000 0600 0000 ................
00000320: 0000 0200 0005 0008 0500 0000 0600 0000 ................
00000330: 6be4 cc2e 3b9a cb9a 0000 0000 0000 0000 k...;...........
@@ -767,10 +767,10 @@
00002fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00002ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003000: 0040 0000 0000 0000 4743 433a 2028 474e .@......GCC: (GN
-00003010: 5529 2031 332e 322e 3120 3230 3233 3038 U) 13.2.1 202308
-00003020: 3031 0000 6c69 6258 3131 2d78 6362 2e73 01..libX11-xcb.s
+00003010: 5529 2031 342e 312e 3120 3230 3234 3035 U) 14.1.1 202405
+00003020: 3037 0000 6c69 6258 3131 2d78 6362 2e73 07..libX11-xcb.s
00003030: 6f2e 312e 302e 302e 6465 6275 6700 0000 o.1.0.0.debug...
-00003040: 164f e2c3 002e 7368 7374 7274 6162 002e .O....shstrtab..
+00003040: fdef bdc9 002e 7368 7374 7274 6162 002e ......shstrtab..
00003050: 6e6f 7465 2e67 6e75 2e70 726f 7065 7274 note.gnu.propert
00003060: 7900 2e6e 6f74 652e 676e 752e 6275 696c y..note.gnu.buil
00003070: 642d 6964 002e 676e 752e 6861 7368 002e d-id..gnu.hash..
This does give a clue: apparently the official binary package was compiled with GCC 13.2.1, whereas I'm using GCC 14.1.1. This explains the differences in libX11.so.6.4.0
as well. But I don't think GCC is to blame here – it's probably just a subtle difference that causes the actual (probably thread-related) bug to manifest or not.
Not being able to reproduce this in my own build, even before adding debug information, makes this thing very hard to debug, but I'll keep trying.
I installed the gcc13
package and used it to compile libX11 again from the official PKGBUILD, but modified with CC=gcc-13 CPP=cpp-13 AR=gcc-ar-13 NM=gcc-nm-13 RANLIB=gcc-ranlib-13
before the ./configure
command. (Not sure all of these are necessary or even correct; CC
is the main one.) Even this didn't help to reproduce the crash.
Something I found in the core dump: at the time of the crash, there were two threads interacting with xcb. The main thread, that aborted:
...
#19 0x00007418b78c1c67 in __assert_fail (
assertion=assertion@entry=0x7418b6d64528 "!xcb_xlib_unknown_req_in_deq",
file=file@entry=0x7418b6d644df "xcb_io.c", line=line@entry=175,
function=function@entry=0x7418b6d77310 <__PRETTY_FUNCTION__.6> "dequeue_pending_request") at assert.c:103
#20 0x00007418b6cfbcef in dequeue_pending_request (dpy=dpy@entry=0x62c40271a710, req=req@entry=0x74187000c270)
at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:175
#21 0x00007418b6cfec95 in poll_for_response (dpy=dpy@entry=0x62c40271a710)
at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:381
#22 0x00007418b6d019b2 in _XEventsQueued (dpy=0x62c40271a710, mode=<optimized out>)
at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:441
#23 0x00007418b6cdecdf in XFlush (dpy=0x62c40271a710) at /usr/src/debug/libx11/libX11-1.8.9/src/Flush.c:39
#24 0x000062c3f636ae5c in DisplayServerX11::_wait_for_events (this=this@entry=0x62c4026fbe50)
at platform/linuxbsd/x11/display_server_x11.cpp:4048
#25 0x000062c3f636d070 in DisplayServerX11::_poll_events (this=0x62c4026fbe50)
at platform/linuxbsd/x11/display_server_x11.cpp:4074
#26 0x000062c3f9fc3e2d in Thread::callback (p_caller_id=<optimized out>, p_settings=...,
p_callback=0x62c3f636d0b0 <DisplayServerX11::_poll_events_thread(void*)>, p_userdata=0x62c4026fbe50)
at core/os/thread.cpp:61
#27 0x000062c3fa8a60e4 in execute_native_thread_routine ()
#28 0x00007418b791fded in start_thread (arg=<optimized out>) at pthread_create.c:447
#29 0x00007418b79a30dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
And a thread that appears to belong to AMD's Vulkan driver:
#0 0x00007418b799539d in __GI___poll (fds=fds@entry=0x74187edffae8, nfds=nfds@entry=1,
timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007418b6ca420b in poll (__timeout=-1, __nfds=1, __fds=0x74187edffae8) at /usr/include/bits/poll2.h:39
#2 _xcb_conn_wait (c=c@entry=0x62c40271b9d0, vector=vector@entry=0x0, count=count@entry=0x0,
cond=<optimized out>) at /usr/src/debug/libxcb/libxcb-1.17.0/src/xcb_conn.c:510
#3 0x00007418b6ca629b in _xcb_conn_wait (count=0x0, vector=0x0, cond=<optimized out>, c=0x62c40271b9d0)
at /usr/src/debug/libxcb/libxcb-1.17.0/src/xcb_conn.c:476
#4 xcb_wait_for_special_event (c=0x62c40271b9d0, se=0x62c402b86190)
at /usr/src/debug/libxcb/libxcb-1.17.0/src/xcb_in.c:806
#5 0x00007418a31c18f0 in ?? () from /usr/lib/amdvlk64.so
#6 0x00007418a31bd495 in ?? () from /usr/lib/amdvlk64.so
#7 0x00007418a31df714 in ?? () from /usr/lib/amdvlk64.so
#8 0x00007418a322ed61 in ?? () from /usr/lib/amdvlk64.so
#9 0x00007418b791fded in start_thread (arg=<optimized out>) at pthread_create.c:447
#10 0x00007418b79a30dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
The latter is hanging in a poll
call, so it wasn't actively racing at the time of the crash, but it's an interesting tidbit that might be a reason why Godot suffers from this bug and other applications don't. I tried with the vkcube
spinning cube Vulkan demo; I couldn't get it to crash, but upon killing it with SIGQUIT
(Ctrl+), this shows the same amdvlk backtrace in its coredump as well.
Line numbers refer to libx11 1.8.9, although the file src/xcb_io.c
hasn't been touched in two years.
On line 319 in poll_for_response()
, we set:
req = dpy->xcb->pending_requests;
There is no code that modifies the req
pointer in the meantime. Then, if there is actually a pending request and some other conditions hold, the pending requests is dequeued:
dequeue_pending_request(dpy, req);
And the first thing that function does, is to fail the assertion:
if (req != dpy->xcb->pending_requests)
throw_thread_fail_assert("Unknown request in queue while "
"dequeuing",
xcb_xlib_unknown_req_in_deq);
Since req
is a local variable and hasn't been changed, this must mean that dpy->xcb->pending_requests
has been changed in the meantime. The culprit must have been either some invalid memory access on the same thread, or a race condition from a different thread. My money is on the latter. (It could theoretically also have been some callback that performed a reentrant libx11 call, but I don't see any place where callbacks are invoked here; also, it would imply a lack of locking somewhere, same as a threading issue.)
It should be noted that we are in an XFlush()
call, which is a critical section, calling LockDisplay()
at the start and UnlockDisplay()
at the end. So if this is a threading issue, we'd want to look for places that modify pending_requests
without issuing such a lock.
There are only two such places that matter: append_pending_request
and dequeue_pending_request
. So I set a conditional breakpoint in both, with the condition dpy->lock->mutex->__data->__owner == 0
(relying on some pthread internals to check if the mutex is locked). After a few minutes, the breakpoint was hit, yielding the following stack trace:
#0 dequeue_pending_request (dpy=dpy@entry=0x55555cd1fde0, req=req@entry=0x55556a1df6f0)
at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:174
#1 0x00007ffff7103343 in _XReply (dpy=0x55555cd1fde0, rep=0x7fffffffdb00, extra=0, discard=0)
at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:736
#2 0x00007ffff70e40f4 in XGetWindowProperty (dpy=0x55555cd1fde0, w=25165826, property=372, offset=0,
length=32, delete=<optimized out>, req_type=4, actual_type=0x7fffffffdbb8, actual_format=0x7fffffffdbb4,
nitems=0x7fffffffdbc0, bytesafter=0x7fffffffdbc8, prop=0x7fffffffdbd0)
at /usr/src/debug/libx11/libX11-1.8.9/src/GetProp.c:69
#3 0x0000555555af1360 in DisplayServerX11::_window_minimize_check (this=this@entry=0x55555ccfc9f0,
p_window=p_window@entry=0) at platform/linuxbsd/x11/display_server_x11.cpp:2375
#4 0x0000555555af167f in DisplayServerX11::window_get_mode (this=0x55555ccfc9f0, p_window=0)
at platform/linuxbsd/x11/display_server_x11.cpp:2705
#5 0x0000555555aeba48 in DisplayServerX11::can_any_window_draw (this=0x55555ccfc9f0)
at platform/linuxbsd/x11/display_server_x11.cpp:2912
#6 0x0000555555b45426 in Main::iteration () at main/main.cpp:3685
#7 0x0000555555ad7311 in OS_LinuxBSD::run (this=this@entry=0x7fffffffddb0)
at platform/linuxbsd/os_linuxbsd.cpp:958
#8 0x0000555555ac5176 in main (argc=<optimized out>, argv=0x7fffffffe398)
at platform/linuxbsd/godot_linuxbsd.cpp:74
When continuing the program after the breakpoint is hit, it immediately crashes apologetically.
The API function XGetWindowProperty
called from Godot does lock the mutex, but _XReply
transiently unlocks it for a while. And apparently, by the time dequeue_pending_request
is called here, the mutex is somehow not locked.
This is as far as I got for today. I tried setting more breakpoints in _XReply
to find out where the lock is lost, but the breakpoints all end up a the top of the function for some reason, and also seem to interfere with my ability to trigger the crash. Stupid Heisenbug.
Hey, this seems to still be an issue running on Ubuntu 24.04 running x11 and KDE.
Godot Engine v4.3.stable.official.77dcf97d8 - https://godotengine.org
Vulkan 1.3.274 - Forward+ - Using Device #0: Intel - Intel(R) UHD Graphics (ICL GT1)
[xcb] Unknown request in queue while dequeuing
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
project.x86_64: ../../src/xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed.
Aborted (core dumped)
I wonder if this is an intel integrated graphics thing? Also should I try and update something to fix this?
It's not specific to Intel -- I have an AMD Radeon and I get it.
Nobody seems to have found a fix yet, so don't bother updating anything.
On November 17, 2024 1:27:52 PM EST, some1and2 @.***> wrote:
Hey, this seems to still be an issue running on Ubuntu 24.04 running x11 and KDE.
Godot Engine v4.3.stable.official.77dcf97d8 - https://godotengine.org Vulkan 1.3.274 - Forward+ - Using Device #0: Intel - Intel(R) UHD Graphics (ICL GT1) [xcb] Unknown request in queue while dequeuing [xcb] You called XInitThreads, this is not your fault [xcb] Aborting, sorry about that. project.x86_64: ../../src/xcb_io.c:175: dequeue_pending_request: Assertion `!xcb_xlib_unknown_req_in_deq' failed. Aborted (core dumped)
I wonder if this is an intel integrated graphics thing? Also should I try and update something to fix this?
-- Reply to this email directly or view it on GitHub: https://github.com/godotengine/godot/issues/75308#issuecomment-2481418444 You are receiving this because you commented.
Message ID: @.***>
Godot version
v4.1.dev.custom_build [0291fcd7b]
System information
Linux Manjaro, kernel 6.1.19, X11
Issue description
The editor or the running project sometimes crashes with the following error:
Crashes are more common while a project is running, but the editor also crashed because of this a couple of times over the past week or so.
I am not using any thread-related functions in my project, physics/rendering are not threaded, the project I'm working on as this happens is a simple GUI-based game.
Steps to reproduce
This seems to happen fairly randomly.
Minimal reproduction project
N/A