Open MeguminSama opened 1 week ago
I think I have this issue too.
Running Bazzite 41 (from Fedora Kinoite), Linux 6.11.8-305.bazzite.fc41.x86_64, AMD Ryzen 9 5900X, NVIDIA GeForce RTX 3080 Ti, KDE Plasma 6.2.3, Wayland 1.23.0.
$ rpm -qa | /bin/grep nvidia
nvidia-gpu-firmware-20241110-1.fc41.noarch
ublue-os-nvidia-addons-0.10-1.fc41.noarch
libnvidia-ml-565.57.01-4.fc41.x86_64
libnvidia-cfg-565.57.01-4.fc41.x86_64
nvidia-driver-cuda-libs-565.57.01-4.fc41.x86_64
nvidia-persistenced-565.57.01-1.fc41.x86_64
nvidia-driver-libs-565.57.01-4.fc41.x86_64
nvidia-container-toolkit-base-1.17.2-1.x86_64
libnvidia-container1-1.17.2-1.x86_64
libnvidia-container-tools-1.17.2-1.x86_64
nvidia-modprobe-565.57.01-1.fc41.x86_64
kmod-nvidia-565.57.01-1.fc41.x86_64
nvidia-kmod-common-565.57.01-2.fc41.noarch
nvidia-driver-565.57.01-4.fc41.x86_64
nvidia-libXNVCtrl-565.57.01-1.fc41.x86_64
libnvidia-ml-565.57.01-4.fc41.i686
nvidia-settings-565.57.01-1.fc41.x86_64
xorg-x11-nvidia-565.57.01-4.fc41.x86_64
nvidia-driver-cuda-565.57.01-4.fc41.x86_64
nvidia-container-toolkit-1.17.2-1.x86_64
libnvidia-fbc-565.57.01-4.fc41.x86_64
libva-nvidia-driver-0.0.13^20241108git259b7b7-1.fc41.x86_64
nvidia-driver-libs-565.57.01-4.fc41.i686
nvidia-driver-cuda-libs-565.57.01-4.fc41.i686
systemd journal:
Nov 19 21:23:18 bazzite kwin_wayland[2537]: kwin_wayland_drm: atomic commit failed: Invalid argument
Nov 19 21:23:20 bazzite kwin_wayland[2537]: kwin_wayland_drm: atomic commit failed: Invalid argument
Nov 19 21:26:38 bazzite kwin_wayland[2537]: kwin_wayland_drm: atomic commit failed: Invalid argument
Nov 19 21:28:15 bazzite kwin_wayland[2537]: kwin_wayland_drm: atomic commit failed: Invalid argument
Nov 19 21:29:09 bazzite kernel: NVRM: nvAssertFailedNoLog: Assertion failed: CliGetEventInfo(rpc_params->hClient, rpc_params->hEvent, &pEvent) @ kernel_gsp.c:462
Nov 19 21:29:09 bazzite kernel: NVRM: _kgspProcessRpcEvent: Failed to process received event 0x1003 (POST_EVENT) from GPU0: status=0x57
Nov 19 21:29:27 bazzite steam[3128]: ERROR: ld.so: object '/usr/lib/extest/libextest.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
Nov 19 21:29:27 bazzite steam[3128]: ERROR: ld.so: object '/usr/lib/extest/libextest.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
Nov 19 21:29:31 bazzite kwin_wayland[2537]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Nov 19 21:29:36 bazzite kwin_wayland[2537]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Nov 19 21:29:41 bazzite kwin_wayland[2537]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Nov 19 21:29:46 bazzite kwin_wayland[2537]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Nov 19 21:29:51 bazzite kwin_wayland[2537]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Nov 19 21:29:56 bazzite kwin_wayland[2537]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Not 100% sure exactly when in this log things froze, but probably around 21:29 around the assertion failure.
Edit: Perhaps notably, the game that was running on the monitor that froze still works fine if I move it to my second monitor, but the main monitor is stuck displaying the most recent frame.
Hi! Thanks for the report! The assert itself looks like a race condition that shouldn't be fatal by itself, but it's probably indicative of the larger problem. Can you please run nvidia-bug-report.sh
as soon as you hit this issue and attach the logs?
(also, please respect the bug template, it helps us with triage and makes it less likely your issues will get lost)
Faced this issue also while compiling a kernel.
Only my second display was frozen - the "main" one was working.
[ 2244.525940] NVRM: nvAssertFailedNoLog: Assertion failed: CliGetEventInfo(rpc_params->hClient, rpc_params->hEvent, &pEvent) @ kernel_gsp.c:463
[ 2244.525948] NVRM: _kgspProcessRpcEvent: Failed to process received event 0x1003 (POST_EVENT) from GPU0: status=0x57
[ 2294.820903] r8169 0000:0b:00.0: invalid VPD tag 0xff (size 0) at offset 0; assume missing optional EEPROM
@mtijanic apologies for not following the template - I opened the issue from the line of code where the error occurred, so didn't get shown the template flow.
I just had the issue occur again, so I've attached the log here.
Thanks! And the lack of template issue is entirely on github. I knew this happened if you go through the new issue -> create account -> finish new issue
flow, but this one is new to me.
I'll get back to you tomorrow on the log analysis, the one above from ptr1337 unfortunately didn't yield too much useful info.
For a workaround, I've found that if I switch to one of the console TTYs (Ctrl-Alt-1/2/3/4/5/6 usually depending on distro) and then back again things unfreeze.
Switching TTY sometimes works for me, but other times it just freezes my whole system 😔
Seemingly at random, one display will completely freeze until I reboot. The other displays continue to work just fine. If I restart the monitor, all of my remaining displays will freeze. All monitors are DisplayPort, connected to an RTX 3070 Ti.
Running on
dmesg output is just:
With the following NVIDIA packages installed:
Relevant code:
https://github.com/NVIDIA/open-gpu-kernel-modules/blob/d5a0858f901d15bda4c3d6db19a271507722a860/src/nvidia/src/kernel/gpu/gsp/kernel_gsp.c#L462-L463