Closed Ma27 closed 6 months ago
Strange :o IIRC I didn't experience a single Sway 1.6 crash on nixos-unstable (only a few other bugs and many swaylock
crashes) and all of the major dependencies (wayland, libdrm, mesa, glibc, etc.) should still match (although that'll change soon with the recent merge of staging
).
Has it maybe something to do with the way you start swaylock
(because that one unfortunately tends to crash lately)?
Or are you e.g. using a customized/hardened Linux kernel, environment.memoryAllocator.provider
, mixing Nixpkgs revisions, etc.?
Sway crashes might also be driver related (not necessarily in this case but in general).
Or are you e.g. using a customized/hardened Linux kernel, environment.memoryAllocator.provider
I'm using 5.10.44 (since today) without any customizations (well, it's tainted because of ZFS, but that should be unrelated). Also, no custom memoryAllocator is in use.
mixing Nixpkgs revisions
Also no, everything is based on a recent release-21.05.
Has it maybe something to do with the way you start swaylock (because that one unfortunately tends to crash lately)?
Could you elaborate what's happening there and how to check that?
I basically have bindsym $mod+l exec lock
in my sway config and lock
where lock
looks like this:
#! /nix/store/kxj6cblcsd1qcbbxlmbswwrn89zcmgd6-bash-4.4-p23/bin/bash
st=
if [[ "$1" != "--no-spotify-pause" ]]; then
st=$(/nix/store/j45cpfbvz61rf27446m6y19nqr58ncsw-playerctl-2.3.1/bin/playerctl --ignore-player=firefox,chromium status)
/nix/store/j45cpfbvz61rf27446m6y19nqr58ncsw-playerctl-2.3.1/bin/playerctl --ignore-player=firefox,chromium pause || true
fi
/nix/store/8icaz7yyc387rb7g818szdw64djfp7wq-swaylock-effects-1.6-3/bin/swaylock --effect-blur 5x8 --screenshot -F \
--clock --fade-in 0.2 --indicator-radius 150 --indicator-thickness 5
if [ "$st" = "Playing" ]; then
/nix/store/j45cpfbvz61rf27446m6y19nqr58ncsw-playerctl-2.3.1/bin/playerctl --ignore-player=firefox,chromium play
fi
Could you elaborate what's happening there and how to check that?
It would only be relevant if you start swaylock
in a way it could crash Sway, your usage looks fine (and in hindsight it wouldn't've explained the crash anyway).
Unfortunately I'm out of ideas then :o I'm also using Linux 5.10 (i915 + iris). Not sure why this apparently only affects you (at least I'm not aware of any other NixOS users that experience crashes; at least that would also suggest that it's GPU/driver related).
It indeed looks like https://github.com/swaywm/wlroots/issues/2475 so it might be worth it to try out different kernel versions (but there's no guarantee that it'll help).
Unfortunately I'm out of ideas then :o I'm also using Linux 5.10 (i915 + iris). Not sure why this apparently only affects you (at least I'm not aware of any other NixOS users that experience crashes; at least that would also suggest that it's GPU/driver related).
May I ask if you're on nixos-unstable or 21.05?
I'm on nixos-unstable but all of the major dependencies should've been the same prior to the very recent staging-next
merge (see https://github.com/NixOS/nixpkgs/issues/127413#issuecomment-864394465). Not sure if it'll work with nixos-unstable on your system but of course it could be worth a try.
@Ma27 if you're using amdgpu
then #126771 might potentially be related (I have no recent enough AMD GPU so I have no idea what happens but after reading a notification it occurred to me that this could be related)
if you're using amdgpu then #126771 might potentially be related (I have no recent enough AMD GPU so I have no idea what happens but after reading a notification it occurred to me that this could be related)
Nope, I'm using an Intel GPU.
I did a system update today and now I'm using latest wlroots with sway 1.6.1 (and also a newer Kernel and libdrm
version), so let's see how this goes :)
if you're using amdgpu then #126771 might potentially be related (I have no recent enough AMD GPU so I have no idea what happens but after reading a notification it occurred to me that this could be related)
Nope, I'm using an Intel GPU.
I did a system update today and now I'm using latest wlroots with sway 1.6.1 (and also a newer Kernel and libdrm
version), so let's see how this goes :)
If I don't have this problem for let's say 10 days, I'll consider it fixed.
With WLR_DRM_NO_MODIFIERS=1
/WLR_DRM_NO_ATOMIC=1
I got the following backtrace after unlocking after three days:
(gdb)
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007f84d51c7523 in __GI_abort () at abort.c:79
#2 0x00007f84d521d958 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7f84d5327c1a "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007f84d522512a in malloc_printerr (
str=str@entry=0x7f84d5329dc8 "malloc_consolidate(): invalid chunk size") at malloc.c:5389
#4 0x00007f84d5225b10 in malloc_consolidate (av=av@entry=0x7f84d535ba00 <main_arena>) at malloc.c:4514
#5 0x00007f84d5227e3d in _int_malloc (av=av@entry=0x7f84d535ba00 <main_arena>, bytes=bytes@entry=1280)
at malloc.c:3727
#6 0x00007f84d522a811 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3448
#7 0x00007f84d59ce71b in lh_table_new ()
from /nix/store/7lpfd7wx3vjhkjsxi4b9yn2dk2d0k7g4-json-c-0.15/lib/libjson-c.so.5
#8 0x00007f84d59ce972 in lh_table_resize ()
from /nix/store/7lpfd7wx3vjhkjsxi4b9yn2dk2d0k7g4-json-c-0.15/lib/libjson-c.so.5
#9 0x00007f84d59ce873 in lh_table_insert_w_hash ()
from /nix/store/7lpfd7wx3vjhkjsxi4b9yn2dk2d0k7g4-json-c-0.15/lib/libjson-c.so.5
#10 0x0000000000415476 in ipc_json_create_node ()
#11 0x0000000000415d7e in ipc_json_describe_node ()
#12 0x0000000000418155 in ipc_get_workspaces_callback ()
#13 0x000000000045844c in output_for_each_workspace ()
#14 0x0000000000452a99 in root_for_each_workspace ()
#15 0x00000000004196cb in ipc_client_handle_command ()
#16 0x000000000041a361 in ipc_client_handle_readable ()
#17 0x00007f84d54b24a2 in wl_event_loop_dispatch (loop=0x1812550, timeout=timeout@entry=-1)
at ../src/event-loop.c:1027
#18 0x00007f84d54b0135 in wl_display_run (display=0x1812d20) at ../src/wayland-server.c:1351
#19 0x000000000040fabd in main ()
Since this is a different code-path which reaches similar issues in the end (also a crash in malloc_consolidate
, even though with a more graceful error handling) and after reading a bit about this error, this seems to be some weird heap corruption.
Have you considered filing an issue upstream? Are you first trying to ensure it's not NixOS-related? Asking because I've hit another segfault (linked above).
Have you considered filing an issue upstream?
I also thought about that while following this issue so here's my personal opinion (but consider this just an FYI and feel free to ignore it):
master
if the last stable release isn't that fresh anymore but currently it's likely still fine.)However, it could be a good idea to ask on IRC (#sway on Libera Chat) what to do in this case / whether so submit a bug report.
@asymmetric in your case the crash seems very easy to reproduce so that looks like a good candidate for an upstream bug report. However, unfortunately the link cannot be shared, which is unfortunate (but privately sharing it with a Sway maintainer might be fine). Maybe there's an alternative link that could cause the same crash?
Bugreport is in https://github.com/swaywm/sway/issues/6372. As suggested there, I'm currently running with ASan to have some more information when the next crash occurs :)
@Ma27 did you get any interesting results?
I'm afraid no, but it's kind of interesting that I didn't have such a crash for a few weeks now (and none on my new laptop).
I currently have Linux 5.13.13, sway 1.6.1, wlroots 0.14.1 and sway is currently compiled on my machine with separateDebugInfo = true;
and -fsanitize=address
(no idea if the latter makes a difference).
I marked this as stale due to inactivity. → More info
Not stale. I'm now seeing these aborts almost every day.
This ticket is for NixOS 21.05 so it shouldn't be relevant anymore. But I guess we could re-use it if this happens on 22.05 or nixos-unstable as well. @gebner your Nixpkgs revision, Sway version, exact error message, etc. would be relevant. I assume you don't run 21.05 anymore :D I'm still unaffected by this (I don't even remember when my Sway session last crashed - it's been super stable since 1.7 and I reboot rarely).
Oh yes, I'm indeed not on 21.05 anymore. I'm on current nixos-unstable (b62ada430501de88dfbb08cea4eb98ead3a5e3e7), which has sway 1.7. And sway has started crashing regularly about a week ago, with crashes about every other day.
The reason why I commented on this bug is because I'm getting almost exactly the same backtrace:
#0 0x00007f64f907cc1f in __pthread_kill_implementation ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#1 0x00007f64f9032042 in raise ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#2 0x00007f64f901d49c in abort ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#3 0x00007f64f90713f8 in __libc_message ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#4 0x00007f64f908629a in malloc_printerr ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#5 0x00007f64f9086924 in malloc_consolidate ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#6 0x00007f64f9088a9c in _int_malloc ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#7 0x00007f64f9089f83 in malloc ()
from /nix/store/lyl6nysc3i3aqhj6shizjgj0ibnf1pvg-glibc-2.34-210/lib/libc.so.6
#8 0x00007f64f8863abe in drmModeAtomicAddProperty ()
from /nix/store/gk27rm0abqd7yvs1j687219di1l4hp3h-libdrm-2.4.110/lib/libdrm.so.2
#9 0x00007f64f92b6e2c in atomic_add ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#10 0x00007f64f92b7122 in atomic_crtc_commit ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#11 0x00007f64f92b88ed in drm_crtc_commit ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#12 0x00007f64f92ba04b in drm_connector_test ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#13 0x00007f64f92ba9e5 in drm_connector_commit ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#14 0x00007f64f92d20da in wlr_output_commit ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#15 0x0000000000423225 in output_render ()
#16 0x000000000041fa27 in output_repaint_timer_handler ()
#17 0x000000000041fd11 in damage_handle_frame ()
#18 0x00007f64f930160c in wlr_signal_emit_safe ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#19 0x00007f64f930160c in wlr_signal_emit_safe ()
from /nix/store/vk3yca1f8zyipy045yn3d8k93npqnk8v-wlroots-0.15.1/lib/libwlroots.so.10
#20 0x00007f64f935e54b in wl_event_loop_dispatch_idle ()
from /nix/store/m1r1vlkzs997xnn8yprkb20dlcvmdfdd-wayland-1.20.0/lib/libwayland-server.so.0
#21 0x00007f64f935e6a6 in wl_event_loop_dispatch ()
from /nix/store/m1r1vlkzs997xnn8yprkb20dlcvmdfdd-wayland-1.20.0/lib/libwayland-server.so.0
#22 0x00007f64f935c2b5 in wl_display_run ()
from /nix/store/m1r1vlkzs997xnn8yprkb20dlcvmdfdd-wayland-1.20.0/lib/libwayland-server.so.0
#23 0x0000000000410cbb in main ()
I've now recompiled sway with -fsanitize=address
as suggested in one of the sway issues, maybe this gets us some more information.
I've finally managed to get something from ASAN:
Errors from xkbcomp are not fatal to the X server
=================================================================
==1947==ERROR: AddressSanitizer: heap-use-after-free on address 0x618000351080 at pc 0x000000457830 bp 0x7ffe82c27dc0 sp 0x7ffe82c27db8
READ of size 8 at 0x618000351080 thread T0
#0 0x45782f in handle_request_pointer_set_cursor (/nix/store/d6cq80z6hx15bxbpb15ianiwhwmkzwx8-sway-unwrapped-1.7/bin/sway+0x45782f)
#1 0x7fba5a38d60b in wlr_signal_emit_safe (/nix/store/1g50l5xl5sldng3rsh3k507y4lzrqgii-wlroots-0.15.1/lib/libwlroots.so.10+0x8b60b)
#2 0x7fba5a362dae in pointer_set_cursor (/nix/store/1g50l5xl5sldng3rsh3k507y4lzrqgii-wlroots-0.15.1/lib/libwlroots.so.10+0x60dae)
#3 0x7fba5979a809 in ffi_call_unix64 (/nix/store/a6n90jvgz1sbr6982f6pzqs7y95x32b2-libffi-3.4.2/lib/libffi.so.8+0x7809)
#4 0x7fba59799942 in ffi_call_int (/nix/store/a6n90jvgz1sbr6982f6pzqs7y95x32b2-libffi-3.4.2/lib/libffi.so.8+0x6942)
#5 0x7fba5a3ea730 in wl_closure_invoke (/nix/store/cfc4ib40v17z0ah3rc8370m6p396qfil-wayland-1.20.0/lib/libwayland-server.so.0+0xd730)
#6 0x7fba5a3e5b99 in wl_client_connection_data (/nix/store/cfc4ib40v17z0ah3rc8370m6p396qfil-wayland-1.20.0/lib/libwayland-server.so.0+0x8b99)
#7 0x7fba5a3e8639 in wl_event_loop_dispatch (/nix/store/cfc4ib40v17z0ah3rc8370m6p396qfil-wayland-1.20.0/lib/libwayland-server.so.0+0xb639)
#8 0x7fba5a3e62b4 in wl_display_run (/nix/store/cfc4ib40v17z0ah3rc8370m6p396qfil-wayland-1.20.0/lib/libwayland-server.so.0+0x92b4)
#9 0x412bec in main (/nix/store/d6cq80z6hx15bxbpb15ianiwhwmkzwx8-sway-unwrapped-1.7/bin/sway+0x412bec)
#10 0x7fba5a0aa236 in __libc_start_call_main (/nix/store/k56d9sk88pvrqhvwpa6msdf8gpwnimf6-glibc-2.34-210/lib/libc.so.6+0x29236)
#11 0x7fba5a0aa2f4 in __libc_start_main_impl (/nix/store/k56d9sk88pvrqhvwpa6msdf8gpwnimf6-glibc-2.34-210/lib/libc.so.6+0x292f4)
#12 0x415400 in _start (/nix/store/d6cq80z6hx15bxbpb15ianiwhwmkzwx8-sway-unwrapped-1.7/bin/sway+0x415400)
0x618000351080 is located 0 bytes inside of 824-byte region [0x618000351080,0x6180003513b8)
freed by thread T0 here:
#0 0x7fba5aab14d7 in free (/nix/store/bym6162f9mf4qqsr7k9d73526ar176x4-gcc-11.3.0-lib/lib/libasan.so.6+0xb14d7)
#1 0x7fba5a3e4ea6 in destroy_resource (/nix/store/cfc4ib40v17z0ah3rc8370m6p396qfil-wayland-1.20.0/lib/libwayland-server.so.0+0x7ea6)
previously allocated by thread T0 here:
#0 0x7fba5aab1987 in calloc (/nix/store/bym6162f9mf4qqsr7k9d73526ar176x4-gcc-11.3.0-lib/lib/libasan.so.6+0xb1987)
#1 0x7fba5a382972 in surface_create (/nix/store/1g50l5xl5sldng3rsh3k507y4lzrqgii-wlroots-0.15.1/lib/libwlroots.so.10+0x80972)
SUMMARY: AddressSanitizer: heap-use-after-free (/nix/store/d6cq80z6hx15bxbpb15ianiwhwmkzwx8-sway-unwrapped-1.7/bin/sway+0x45782f) in handle_request_pointer_set_cursor
Shadow bytes around the buggy address:
0x0c30800621c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c30800621d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c30800621e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c30800621f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c3080062200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c3080062210:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c3080062220: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c3080062230: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c3080062240: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c3080062250: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c3080062260: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1947==ABORTING
[2022-09-04 20:50:40.026] [error] Workspaces: Unable to receive IPC header
Gdk-Message: 20:50:40.026: Error reading events from display: Broken pipe
[2022-09-04 20:50:40.026] [error] Window: Unable to receive IPC header
[2022-09-04 20:50:40.026] [error] Mode: Unable to receive IPC header
Gdk-Message: 20:50:40.026: Error reading events from display: Broken pipe
Gdk-Message: 20:50:40.026: Error reading events from display: Broken pipe
[2022-09-04 20:50:40.026] [error] Window: Unable to receive IPC header
[2022-09-04 20:50:40.026] [error] Window: Unable to receive IPC header
[2022-09-04 20:50:40.026] [error] Mode: Unable to receive IPC header
Gdk-Message: 20:50:40.026: Error reading events from display: Broken pipe
Gdk-Message: 20:50:40.028: Error reading events from display: Broken pipe
(EE) failed to read Wayland events: Broken pipe
Gdk-Message: 20:50:40.026: Error reading events from display: Broken pipe
Gdk-Message: 20:50:40.028: Error reading events from display: Broken pipe
X connection to :0 broken (explicit kill or server shutdown).
X connection to :0 broken (explicit kill or server shutdown).
Exiting due to channel error.
Exiting due to channel error.
You may want to file a bugreport against upstream.
This issue was made against an old version of Sway a while ago. I didn't have any of these issues on my past two laptops (a ThinkPad T14s Gen 1 & a Framework 13" Intel 11th Gen).
if other people have similar issues, a new issue should be opened IMHO.
Describe the bug I'm regularly observing
sway
crashes, especially after unlocking withswaylock
. This happens all two to three days. I get the following backtrace when investigating the crash viacoredumpctl gdb
:To Reproduce Unfortunately this happens pretty unregularly and I haven't identified a pattern yet.
Expected behavior No crashes when unlocking
sway
.Additional context Could be related to https://github.com/swaywm/wlroots/issues/2475 or https://github.com/swaywm/wlroots/issues/204, but both seem to have been fixed in the past.
Notify maintainers @primeos @Synthetica9
Metadata