NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.25k stars 1.29k forks source link

Constant assertion failed spam in dmesg when having Counter-Strike 2 running #716

Open C0rn3j opened 1 month ago

C0rn3j commented 1 month ago
NVIDIA Open GPU Kernel Modules Version: 560.35.03
Operating System and Version: Arch Linux
Stable kernel release: Yes
Kernel Release: 
  - 6.11.1-arch1-1
  - 6.6.52-1-lts
  - 6.10.x
GPU: NVIDIA GeForce RTX 4090
I confirm that this does not happen with the proprietary driver package: Yes
Bug Incidence: Always

nvidia-bug-report.log.gz

Describe the bug

These bugs have started being reported since 555 on the forums by three people so far:

I still get them very often on the 560 open modules version, they overload the entire dmesg with the same line spammed over and over.

[74168.771340] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74168.846717] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74170.204043] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74171.287663] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74171.712966] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74172.195753] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74172.317736] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74172.408335] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289
[74172.773007] NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289

To Reproduce

Launch Counter-Strike 2 from native or Flatpak Steam, and watch dmesg -w, they start being thrown immediately even during the logos.

mtijanic commented 1 month ago

Thanks! This is tracked internally as nvbug 4729070. Looks like the fix missed the 565.xx train so it should be in 570.xx.

C0rn3j commented 1 month ago

Thanks for the information, will the patch source be available before the 570.xx (beta) release?

I would like to apply it on my system earlier if at all possible, as this problem is capable of filling the entire default dmesg buffer with this message, and very quickly at that.

EDIT: Arch Linux now ships at least the silencing patch, until 570 releases - https://gitlab.archlinux.org/archlinux/packaging/packages/nvidia-open/-/merge_requests/6

mtijanic commented 1 month ago

You can trivially hide the message by setting RmMsg to !event_notification.c:289. Or just compiling it out like:

diff --git a/src/nvidia/src/kernel/rmapi/event_notification.c b/src/nvidia/src/kernel/rmapi/event_notification.c
index cf78eadd..d6937cac 100644
--- a/src/nvidia/src/kernel/rmapi/event_notification.c
+++ b/src/nvidia/src/kernel/rmapi/event_notification.c
@@ -286,11 +286,11 @@ static NV_STATUS _gpuEngineEventNotificationListNotify
     portSyncSpinlockAcquire(pEventNotificationList->pSpinlock);
     {
         // We don't expect this to be called multiple times in parallel
-        NV_ASSERT_OR_ELSE(pEventNotificationList->pendingEventNotifyCount == 0,
+        if (pEventNotificationList->pendingEventNotifyCount != 0)
         {
             portSyncSpinlockRelease(pEventNotificationList->pSpinlock);
             return NV_ERR_INVALID_STATE;
-        });
+        }

         EngineEventNotificationListIter it =
             listIterAll(&pEventNotificationList->eventNotificationList);

However, while not particularly problematic, this is an actual race condition bug, and the fix is a bit more involved. Which also makes it a bit of a hassle to extract as a patch, so no promises there.

gentuser commented 1 month ago

Hi , I have the same issue playing Dota2 and using nvidia open drivers v. 550.127.05 and 560.35.03 , they both spam NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289 Furthermore the screen becomes randomly black for 3-4 seconds while playing. Switching from in-game full screen to desktop causes some lag of 2-3 seconds showing a black screen.

Using 550.127.05 closed drivers, the assertion isn't shown and there is no lag in switching from fullscreen Dota2 and the desktop. No black screen while playing neither.

I'm using kde plasma-x11 6.1.5, kernel 6.11.0-9-generic (64 bit) , NVIDIA GeForce RTX 3090/PCIe/SSE2.

I'll stick to the closed version for now till a fix in 570 will come.

thanks a lot!

PS. thanks to nvidia I learnt how to use my abilities while the screen is completely black for few seconds. Now I believe I improved my skills in Dota2 !

C0rn3j commented 1 month ago

Hi , I have the same issue playing Dota2 and using nvidia open drivers v. 550.127.05 and 560.35.03 , they both spam NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289 Furthermore the screen becomes randomly black for 3-4 seconds while playing. Switching from in-game full screen to desktop causes some lag of 2-3 seconds showing a black screen.

Using 550.127.05 closed drivers, the assertion isn't shown and there is no lag in switching from fullscreen Dota2 and the desktop. No black screen while playing neither.

I'm using kde plasma-x11 6.1.5, kernel 6.11.0-9-generic (64 bit) , NVIDIA GeForce RTX 3090/PCIe/SSE2.

I'll stick to the closed version for now till a fix in 570 will come.

Closed source driver does not suffer from this in any version.

But based on the "not particularly problematic" classification above, it ends up just causing a lot of spam and not real issues, so it should be unrelated to your trouble.

I would retry on Wayland and Plasma 6.2.2, and if that does not cut it, you can try the 565 beta driver.

Also ensure you have fbdev enabled(see https://wiki.archlinux.org/title/NVIDIA#Wayland_configuration), as that can be a problem with 6.11+ kernels at the moment.

gentuser commented 1 month ago

Hi , I have the same issue playing Dota2 and using nvidia open drivers v. 550.127.05 and 560.35.03 , they both spam NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289 Furthermore the screen becomes randomly black for 3-4 seconds while playing. Switching from in-game full screen to desktop causes some lag of 2-3 seconds showing a black screen. Using 550.127.05 closed drivers, the assertion isn't shown and there is no lag in switching from fullscreen Dota2 and the desktop. No black screen while playing neither. I'm using kde plasma-x11 6.1.5, kernel 6.11.0-9-generic (64 bit) , NVIDIA GeForce RTX 3090/PCIe/SSE2. I'll stick to the closed version for now till a fix in 570 will come.

Closed source driver does not suffer from this in any version.

But based on the "not particularly problematic" classification above, it ends up just causing a lot of spam and not real issues, so it should be unrelated to your trouble.

I would retry on Wayland and Plasma 6.2.2, and if that does not cut it, you can try the 565 beta driver.

Also ensure you have fbdev enabled(see https://wiki.archlinux.org/title/NVIDIA#Wayland_configuration), as that can be a problem with 6.11+ kernels at the moment.

Hi, thanks a lot for your comment. Deactivating allow flipping the random in-game black screen issue has been solved.