Open adolfgatonegro opened 1 month ago
this happens since kernel 6.11 Why i do not know When you downgrade tot kernel 6.10 all is back to normal
The only report I have so far is that this happens with Kernel 6.11 and does not happen with Kernel 6.10. This also happens with a bare dwm.
The crash / segmentation violation seems to be in relation to the binary file being overwritten, as manually moving the old file away (from /usr/local/bin) before compiling seemingly mitigates the issue.
I'll let you know once I know more.
The only report I have so far is that this happens with Kernel 6.11 and does not happen with Kernel 6.10. This also happens with a bare dwm.
The crash / segmentation violation seems to be in relation to the binary file being overwritten, as manually moving the old file away (from /usr/local/bin) before compiling seemingly mitigates the issue.
I'll let you know once I know more.
This is indeed an issue with 6.11. Rolling back to 6.10 prevents this from happening, as does using the 6.6 LTS kernel, which is what I'm currently doing.
Thanks for looking into it, mate. Let me know if I can provide any additional info or test anything to help.
6.11.1-arch1-1 Plain dwm and with some light patching. Intel iGPU: Intel Corporation HD Graphics 630
Oct 07 03:47:27 zn systemd[1]: Starting /usr/bin/make install...
Oct 07 03:47:27 zn systemd[1]: Started /usr/bin/make install.
Oct 07 03:47:27 zn systemd[1]: run-u44.service: Deactivated successfully.
Oct 07 03:47:27 zn kernel: dwm[728]: segfault at 819e ip 000000000000819e sp 00007ffe5cf6f988 error 14 likely on CPU 2 (core 2, socket 0)
Oct 07 03:47:27 zn kernel: Code: Unable to access opcode bytes at 0x8174.
Similar output when make install
aslstatus, as I wanted to check it with another application.
Oct 07 04:08:15 zn kernel: temperature[4248]: segfault at 55e7 ip 00000000000055e7 sp 0000765ed57ffd58 error 14 likely on CPU 3 (core 3, socket 0)
Oct 07 04:08:15 zn kernel: Code: Unable to access opcode bytes at 0x55bd.
Running lsof
showed that the process holds a file descriptior of type "mem" pointing to the the binary file.
$ sudo lsof | grep -E "COMMAND|/usr/local/bin/dwm"
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
dwm 2697 sbakkeby txt REG 0,27 448376 30882447 /usr/local/bin/dwm
dwm 2697 sbakkeby mem REG 0,26 30882447 /usr/local/bin/dwm (path dev=0,27)
I am assuming that this is a new thing in Kernel 6.11.
My interpretation of what is happening here is that when we re-compile and install dwm the binary data of the file handle (/usr/local/bin/dwm) is being overwritten ultimately causing a segmentation fault for the process holding the memory file handle.
A quick workaround for this issue is to delete the original file before we copy the new file.
diff --git a/Makefile b/Makefile
index ffa69b4..c5e7554 100644
--- a/Makefile
+++ b/Makefile
@@ -32,6 +32,7 @@ dist: clean
install: all
mkdir -p ${DESTDIR}${PREFIX}/bin
+ rm -f ${DESTDIR}${PREFIX}/bin/dwm
cp -f dwm ${DESTDIR}${PREFIX}/bin
chmod 755 ${DESTDIR}${PREFIX}/bin/dwm
mkdir -p ${DESTDIR}${MANPREFIX}/man1
Here is what the lsof
output looks like after the file has been deleted (or is moved).
$ sudo lsof | grep -E "COMMAND|/usr/local/bin/dwm"
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
dwm 2697 sbakkeby txt REG 0,27 448376 30882447 /usr/local/bin/dwm (deleted)
dwm 2697 sbakkeby DEL REG 0,26 30882447 /usr/local/bin/dwm
I'm just following this out of curiosity, I do not know much about what I am doing. I thought of checking lsof
too but didn't know what to do with the output.
Are any of the reports from distros other than Arch Linux?
In case it might offer more clues, here is some information from my system:
mem
FD for me.(deleted)
, but no extra DEL
FD like yours.txt
FDs for all applications.So, those might be unrelated? Maybe related to the filesystem? I use ext4. In case it might be related to swap, zram, etc., I have none of those on my system. I also tried suspend / wakeup, no difference.
% sudo lsof | grep -iE "COMMAND|bin/dwm"
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
dwm 1834 km txt REG 254,0 67920 3543170 /usr/local/bin/dwm
% sudo rm -f /usr/local/bin/dwm
% sudo lsof | grep -iE "COMMAND|bin/dwm"
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
dwm 1834 km txt REG 254,0 67920 3543170 /usr/local/bin/dwm (deleted)
Trying with other applications:
make install
from git repo gets the segfault and crash. (/usr/local/bin/nsxiv)pacman -S nsxiv
does not get the segfault or crash, but gets the (deleted)
. (/usr/bin/nsxiv)pacman -S
had no issue neither, but got the (deleted)
. Nothing peculiar in journal.make install
. So there seems to be no effect of that.pacman nsxiv:
% sudo lsof | grep -iE "COMMAND|bin/nsxiv"
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
nsxiv 3837 km txt REG 254,0 88712 3546241 /usr/bin/nsxiv
% sudo pacman -S nsxiv
% sudo lsof | grep -iE "COMMAND|bin/nsxiv"
COMMAND PID TID TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME
nsxiv 3837 km txt REG 254,0 88712 3546241 /usr/bin/nsxiv (deleted)
git nsxiv:
Oct 07 19:37:42 zn kernel: nsxiv[3677]: segfault at 5aa6 ip 0000000000005aa6 sp 00007fff7170e3e8 error 14 likely on CPU 3 (core 3, socket 0)
Oct 07 19:37:42 zn kernel: Code: Unable to access opcode bytes at 0x5a7c.
aslstatus:
Oct 07 19:43:56 zn kernel: cpu_percentage[3771]: segfault at 3dac ip 0000000000003dac sp 00007b49c3dffd58 error 14 likely on CPU 2 (core 2, socket 0)
Oct 07 19:43:56 zn kernel: Code: Unable to access opcode bytes at 0x3d82.
[...]
Oct 07 19:45:24 zn kernel: temperature[4276]: segfault at 55e7 ip 00000000000055e7 sp 0000763b359ffd58 error 14
Oct 07 19:45:24 zn kernel: ram_used[4277]: segfault at 5481 ip 0000000000005481 sp 0000763b34fffd58 error 14
Oct 07 19:45:24 zn kernel: likely on CPU 0 (core 0, socket 0)
Oct 07 19:45:24 zn kernel: likely on CPU 2 (core 2, socket 0)
Oct 07 19:45:24 zn kernel:
Oct 07 19:45:24 zn kernel: Code: Unable to access opcode bytes at 0x55bd.
Oct 07 19:45:24 zn kernel: Code: Unable to access opcode bytes at 0x5457.
Hey, I'm running into an issue with
dwm
, similar to #324.SYSTEM: Arch Linux KERNEL: 6.11.2-arch1-1 NVIDIA DRIVER: 560.35.03-11 XORG X SERVER: 21.1.13-1 DWM VERSION: dwm-6.5 (last commit: 36cbcf53a232818e5d523dd0337bb635556e91ef)
My regular build uses flexipatch, though I'm also seeing the issue with the latest unmodified
dwm
from upstream.Issue
Installing after compilation, with
sudo make install
, causesdwm
to crash, dropping me to the TTY. Sometimes it happens right after the install finishes, sometimes it takes a couple of seconds; regardless it crashes every time without further input on my part (not even triggering a restart ofdwm
myself).Additional info
So far, this issue happens only on my desktop, which has an NVIDIA GPU. I am using the same build on my laptop, with AMD graphics, and everything seems to work correctly.Never mind, it is now happening on both of my systems.This has not been an issue before kernel update 6.11. I had been using dwm-flexipatch based on
dwm
6.4 since early last year, and everything worked fine. The issue started happening with my 6.4 build, and remains after a fresh build of 6.5.I can reliably reproduce the issue with an unmodified build of dwm-flexipatch, without any customisation or patching, so it does not seem like an issue with any particular patch I'm using.
I've managed to dig up the following information. I'm not a developer and have no experience debugging software, so I might be missing something obvious.
dmesg
loggdb
This is as far as I've got. I assume
XNextEvent
is related to Xorg in some way, but I have not been able to find any references to issues like this.Do let me know if there's anything else I can look at, and apologies if this is not the right place to submit this issue. Seems to affect upstream as well, but maybe something can come out of posting here.
Cheers