NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
14.18k stars 1.18k forks source link

[REGRESSION] [535.54.03] The entire screen is frequently flickering #511

Closed birdie-github closed 8 months ago

birdie-github commented 1 year ago

NVIDIA Open GPU Kernel Modules Version

535.43.02

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Fedora 38

Kernel Release

6.3.5

Hardware: GPU

NVIDIA GeForce GTX 1660 Ti

Describe the bug

The screen is constantly flickering, no matter what applications are running.

In Firefox it's happening every few seconds. In other "simple" applications it's less frequent.

To Reproduce

Install.

Bug Incidence

All the time

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

This is a regression.

I've reverted to 530.41.03 and it's all good.

Windows users seem to be affected as well. Could be a code change which affects both drivers.

thesword53 commented 1 year ago

Do you have 2+ monitors?

I think it's a problem with VRR and dual monitor setup. I also have the issue on Windows with the 530 and 535 drivers and Linux with the 535 drivers. The 530 Linux drivers are not affected because of another bug sticking GPU at higher power state on multi-monitor and >60Hz setup. I also noticed flickering is caused by VRR/G-Sync screen frequency stuttering and happens when GPU is switching power state.

birdie-github commented 1 year ago

I have a single monitor.

Littlejth commented 1 year ago

I also have this issue on NixOS with kernel 6.3.5 on plasma 5.27.5 running on my RTX 3070. I tried to log into wayland initially and my screen perpetually flickered without ever making it into the session and I had to reboot my machine. I restarted and then tried to go into my X session which worked fine. I was able to go into the nvidia-settings panel and set both my monitors (144hz 1440p secondary over HDMI and 144hz 4K primary over DVI) down to 60hz. Then when I logged out and switched into wayland, no issue. So at least in my experience here and now, it's not necessarily whether or not it's VRR but more to do with high refresh rate in general. Both my monitors are freesync and work when forcing them to work with G-Sync but I didn't have that enabled at all.

EDIT: After doing some experimenting, I can have my 4K set at 144hz and my 1440p at 120hz. If I bring the 1440p monitor up to 144hz that's when I start to have issues. Also in my X session I do see occasional flickering but it's not bad to the point that I can't use the session.

birdie-github commented 1 year ago

530.41.03 drivers are also affected only it takes quite a lot of work to trigger the bug.

Here's roughly what I did:

  1. I changed resolution to 1920x1080@60Hz
  2. I changed resolution back to 2560x1440@144Hz
  3. I switched to Linux console and back

Now the entire screen is flashing and flickering like crazy. Only a complete reboot fixed the issue. Restarting X.org didn't help.

Something is deeply broken for the last two versions of drivers.

Edit: my GPU is acting up. Via HDMI everything is OK, via DP I get artifacts all over the screen even when I'm in BIOS. I cleaned it up, reseated it and now it's all fine.

birdie-github commented 1 year ago

It's still reproducible in 535.54.03 though to a much lesser extent.

For instance when I open Firefox the screen is briefly flashing. Sometimes opening new websites in Firefox also produces flashing.

I can live with that but not a single driver prior had this bug.

birdie-github commented 1 year ago

This is not limited to Firefox. Sometimes the screen briefly flashes even when I'm on the XFCE desktop with all the applications minimized.

I'm not using compositing.

This is reproducible with or without GSync enabled.

birdie-github commented 1 year ago

I've reverted back to 530.41.03 which are free from this bug.

The screen flashing with the new drivers is driving me insane. I don't want my monitor to die because someone in NVIDIA fucked up and I've heard that it's possible to kill a monitor using software.

TBO I'm mad. This bug report has seen zero replies from NVIDIA developers, the release notes don't confirm it either. At least there's a line in your Windows drivers: "When using multiple monitors which support adaptive sync, users may see random flicker on certain displays when G-SYNC is enabled after updating to driver 535.98 [4138119]"

Too bad I've got a single 144Hz HRR/HDR10 GSync 2560x1440 compatible monitor.

birdie-github commented 1 year ago

Pinging @aritger @aaronp24 @amrit1711

sentakuhm commented 1 year ago

I can confirm flickering too Nvidia driver version: 535.54.03. System: Arch linux. Kernel: 6.3.8-arch1-1. display server protocol: Wayland.

birdie-github commented 1 year ago

@sentakuhm

Please attach your sudo nvidia-bug-report.

sentakuhm commented 1 year ago

@birdie-github

Please attach your sudo nvidia-bug-report.

nvidia-bug-report.log.gz

Zenzi0 commented 1 year ago

Yep, happens to me too. As long as this happens this driver version really isn't usable. Nvidia driver version: 535.54.03 Arcolinux (Arch based, uses same repositories) Kernel: 6.3.8-zen1-1-zen Wayland, Sway

amrit1711 commented 1 year ago

Hi All, We see similar issue reported on forum with earlier released driver 525.89.02, will you be able to test if you also see same issue with 525.89.02. If that's the case, then we are already working on it. Otherwise, we will need to consider it as a different bug and shall try to reproduce issue locally.

Zenzi0 commented 1 year ago

525.89.02 is not in my repos, so I hope somebody else can provide that information. This is probably not very helpful but in my case with Sway I don't experience the flickering issue only with one very specific version of sway-git (r7096.f21090f9-1) and wlroots-git (0.17.0.r6176.12e28c34-1) which an additional non-Arch repo from Arcolinux provides. Any newer version of sway-git or the non-git sway (1:1.8.1-1) have the flickering issue.

birdie-github commented 1 year ago

Hi All, We see similar issue reported on forum with earlier released driver 525.89.02, will you be able to test if you also see same issue with 525.89.02. If that's the case, then we are already working on it. Otherwise, we will need to consider it as a different bug and shall try to reproduce issue locally.

I've had this issue only with 535.43.02 and 535.54.03 drivers.

I've never had it with older drivers.

dbrhks490 commented 1 year ago

Hi,

I don't have this issue with the 525.xx.xx. Problem appeared since the 530 beta. I experience a short black flickering at the top of the screen.

Here what i tried :

If i put my screen to 144hz and if i enable compositor, my computer is unusable because the flickering is omnipresent. Just scrolling an html web page make the monitor flicker. Watching a video is extremely unpleasant since the screen can flicker more than 10 time in a minute.

The only way to reduce this weird thing is to loverate the framerate (100hz is a good compromise for my 2560x1440 monitor) and the most important is to disable the compositor if possible. By doing this, the GPU does not switch to much between p-states, so the screen continue to flicker but much less frequently. The problem dost not occur in games, when the GPU is at his maximum performances.

I suspect too that only GPU based on the Turing architecture are affected. The people talking about this thing on the Nvidia devlopper forum are all equipped with RTX 2000 or GTX 1660.

Monitor : real g-sync 2560x1440 connected throught displayport GPU : RTX 2080 OS : Debian or Ubuntu

LoipesMas commented 1 year ago

My experience has been similar to @dbrhks490 Arch updated me from 530 to 535 and the flickering appeared. If I lower my framerate to 120Hz (or lower), the flickering is gone. If I disable my second display, the flickering is also gone, even at 165Hz. And launching a high-demanding game does seem to stop the flickering. Happens both on Wayland (Hyprland) and X11 (dwm).

And for me the flickers look like momentary disconnects: the screen goes full black for ~3 seconds and "Connected over DisplayPort" popup from the monitor shows up. And only the main display does flicker.

Edit: It seems that on lower framerates I get "partial" flickers, i.e. top third of my screen goes black for like a frame. This annoying, but not terrible. And I rolled back to 530.41.03 and there are none of this issues.

My setup: Main monitor: 3440x1440@165Hz Second monitor: 1920x1440@60Hz GPU: RTX2060 OS: Arch

Let me know if you need any more info or testing

notfood commented 1 year ago

Chiming in to share my experience. I'm experiencing partial flickering on my main monitor in a dual monitor setup using 535.54.03 at 60Hz.

Environment: Main monitor: 1920x1080@60Hz Second monitor: 1024x768@60Hz GPU1: GeForce GTX 1650 GPU2: Tesla M40 (not in use) OS: ArchLinux

It's occasional, I suspect GPU is switching power states, the flickering only happens on the top of the screen. It doesn't matter what I'm running, I can be on X11 or Wayland, it happens on KDE Plasma and I3w, it happens with no applications running, it happens during the beginning of heavy GPU usage. Downgrading it to 530.41.03 shows none of these issues.

birdie-github commented 1 year ago

Would be great if all the affected people

  1. attached their sudo nvidia-bug-report and
  2. specified the exact monitor vendor and model

Thanks

CrossroadInTheVoid commented 1 year ago

530.41.03 flickers even with nvidia-drm.modeset=1 in 3d or playing accelerated video. Bug was published at nvidia.com. So 535.54.03 is not the only bugged version.

birdie-github commented 1 year ago

530.41.03 flickers even with nvidia-drm.modeset=1 in 3d or playing accelerated video. Bug was published at nvidia.com. So 535.54.03 is not the only bugged version.

I intended this bug report to be relevant only for the use case of 535.43.02 and 535.54.03 drivers regressing and 530.41.03 being bug free. And I've been using kms since forever options nvidia-drm modeset=1

You really could file a new bug report if you were affected earlier.

CrossroadInTheVoid commented 1 year ago

I intended this bug report to be relevant only for the use case of 535.43.02 and 535.54.03 drivers regressing and 530.41.03 being bug free. And I've been using kms since forever options nvidia-drm modeset=1

You really could file a new bug report if you were affected earlier.

530.41.03 is a new feature branch version so I do not think that bugfixing is actual for it. The only thing I want to say — 530.41.03 IS NOT bug free version. It has the same black-flickering bug (or this bug only looks like the same, idk), several users have reported this. So, I think this regression has appeared not in 535. but in 530..

z1atk0 commented 1 year ago

Same here, occasional odd flicker across the top of both monitors, ever since I upgraded to 535.54.03. I always stick to the latest "Production Branch Version", and I never had this problem before.

System: GeForce GTX 1660 Ti displaying on two AOC 24G2SPU connected via HDMI, running Slackware64-15.0 on XOrg-X11 (not Wayland).

[root@disclosure:~]# inxi -G
Graphics:
  Device-1: NVIDIA TU116 [GeForce GTX 1660 Ti] driver: nvidia v: 535.54.03
  Device-2: Sunplus Innovation Full HD webcam driver: snd-usb-audio,uvcvideo
    type: USB
  Display: x11 server: X.Org v: 1.20.14 with: Xwayland v: 21.1.4 driver: X:
    loaded: nvidia unloaded: nouveau gpu: nvidia resolution: 1: 1920x1080~60Hz
    2: 1920x1080~60Hz
  API: OpenGL v: 4.6.0 NVIDIA 535.54.03 renderer: NVIDIA GeForce GTX 1660
    Ti/PCIe/SSE2

My nvidia-bug-report.log.gz is attached as well.

nvidia-bug-report.log.gz

jarrard commented 1 year ago

Same issue on NixOS with 4090 and 535 54 03 driver. X11 has this flicker, often when I'm using a app/browser, doesn't happen on Wayland but then I must set primary display to 60hz or I can't login. (plasma)

Haven't tested Gnome X11 yet. Kind of warn out with all these nvidia bugs lately...

birdie-github commented 1 year ago

First things first: the display flickering bug is not the Linux NVIDIA developers screw up, it's the part that comes from the Windows 535.98 drivers which were reported to break multi-monitor systems left and right. NVIDIA has ostensibly fixed this in the Windows 536.23 drivers:

When using multiple monitors which support adaptive sync, users may see random flicker on certain displays when G-SYNC is enabled after updating to driver 535.98 [4138119]

The bigger issue is that multiple users complained that the Windows 536.23 drivers broke single monitor systems as well but it looks like NVIDIA hasn't paid enough if any attention to the issue.

Now considering all the reports above it looks like the Windows multi-monitor bug fix hasn't been ported to Linux and the single monitor flickering issue continues to plague both branches.

Worst of all, it looks like the NVIDIA Linux engineers have been left in the dark or simply don't care about the fact that the last released Linux driver is simply broken and unusable and I reported the bug three weeks ago when drivers were still in beta, so they had ample time to investigate and fix it or at least ping their Windows colleagues and ask for help.

I'm mad, I'm simply mad. Luckily 530.41.03 drivers still work flawlessly here, and they support Linux 6.3, so NVIDIA has got another chance of fixing everything.

sachnr commented 1 year ago

just updated today can confirm new drivers cause flickering in both x11 and wayland.

GrzegorzKozub commented 1 year ago

For me only the primary monitor flickers. I've got two almost identical displays. Also, the flickering only happens on Wayland. When select GNOME on Xorg as a session to use in GDM the flicker is gone.

I'm on Arch Linux using GDM and GNOME. Potentially relevant packages are in these versions on my system:

linux 6.3.8.arch1-1
linux-firmware 20230404.2e92a49f-1

nvidia 535.54.03-2
nvidia-lts 1:535.54.03-2
nvidia-utils 535.54.03-1

gdm 44.1-1
gnome-shell 1:44.2-1
gnome-session 44.0-1
mutter 44.2-1

My relevant configuration is:

My relevant hardware is:

nvidia-bug-report.log.gz

The flickering on Wayland is so intense that it happens every few seconds and renders the system unusable. The experience is similar turning off the monitor and then immediately turning it back on again.

This video should explain what I mean.

Workarounds:

jarrard commented 1 year ago

You might be able to stop the flicker if you open nvidia settings and turn OFF No Flipping in the options there. Worked for me.

However I think there is a performance hit if you do so. Could be related to plasma not having triple buffer set which you can do with the xorg config. No idea about Wayland since I can barely get that working under PLASMA.

CrossroadInTheVoid commented 1 year ago

You might be able to stop the flicker if you open nvidia settings and turn OFF No Flipping in the options there. Worked for me.

No Flipping? Is it new option? There is Allow Flipping checkbox in OpenGL Settings only in earlier versions.

birdie-github commented 1 year ago

You might be able to stop the flicker if you open nvidia settings and turn OFF No Flipping in the options there. Worked for me.

However I think there is a performance hit if you do so. Could be related to plasma not having triple buffer set which you can do with the xorg config. No idea about Wayland since I can barely get that working under PLASMA.

Could you show how it's done?

This is what I have here and I have flickering:

123

jarrard commented 1 year ago

Yeah I think it is Allow Flipping. I turned that off and issue went away but perhaps if you try with it on?

It's also possible my issue is different because I managed to install the older 530 drivers and it still did it.

birdie-github commented 1 year ago

Yeah I think it is Allow Flipping. I turned that off and issue went away but perhaps if you try with it on?

It's also possible my issue is different because I managed to install the older 530 drivers and it still did it.

This option is applicable only to OpenGL applications or when you're running compositing.

I've got neither. Plain X11 XFCE session.

jarrard commented 1 year ago

Ok well it helped under Plasma, that is my test case since I want that to work in X11 and WL.

Also I thought XFCE did have a compositor.

z1atk0 commented 1 year ago

Also I thought XFCE did have a compositor.

OT: It does, but it can be disabled. Which I also did on my old laptop running Slack64-15.0 & XFCE, because it just cost too much performance there (the laptop has an old Celeron CPU and an Intel Mobile 4 GPU).

birdie-github commented 1 year ago

Ok well it helped under Plasma, that is my test case since I want that to work in X11 and WL.

Also I thought XFCE did have a compositor.

I'm not interested in bells and whistles and increased GPU power consumption. Besides this bug is not about OpenGL/Vulkan applications.

GrzegorzKozub commented 1 year ago

For my scenario https://github.com/NVIDIA/open-gpu-kernel-modules/issues/511#issuecomment-1596567416 disabling Allow Flipping under OpenGL Settings did not fix the issue. This is probably because my issue is on Wayland only.

Also note that disabling this toggle did not persist a reboot.

amrit1711 commented 1 year ago

I used below setup to reproduce issue locally but no luck so far. Dell Precision T7610 + Genuine Intel(R) CPU @ 2.30GHz + Arch Linux + Kernel 6.3.8-arch1-1 + Driver 535.43.02 + NVIDIA GeForce RTX 2070 + GBT AORUS FI27Q-P with resolution 2560 x 1440 and refresh rate as 165 Hz + DELL G3223D with resolution 2560 x 1440 and refresh rate as 144 Hz + KDE Plasma Version 5.27.5 + X11 Protocol Ran youtube video using firefox browser in both primary and secondary displays but did not observed any flickering. I also checked with single monitor connection as well but no luck in repro. Shall try on few more setups and update.

birdie-github commented 1 year ago

@amrit1711

Yeah, we already know you're the only Linux tester at NVIDIA and you have limited hardware. I guess if NVIDIA drivers continue to be broken and the older ones no longer work, I will finally leave NVIDIA after using its products for over 25 years.

This is not just a minor regression, that's a bloody deal breaker which renders your computer completely unusable. Maybe someone at NVIDIA doesn't quite understand the severity of the situation.

Again the last released drivers are outright broken for a large number of people.

sentakuhm commented 1 year ago

@birdie-github i'm about to build my next pc its all AMD, i already decided that, especially when AMD finally move all their drivers and firmware to open-source.

CrouchingTigger9 commented 1 year ago

Hello,

I am affected by the issue too. The flickering didn't start right away after installation/reboot. First time I tested the new driver, noticeable flickering started only after a short Proton gaming session and persisted after a reboot. So, I went back to 530.41.03. On the next try I was unaffected for the better part of the day and it started quite randomly while watching YouTube on Firefox. Unchecking "Allow Flipping" seems to have fixed it for now. System information and bug report below.

System: Gentoo (17.1 profile with some ~amd64 packages), kernel 6.3.8, OpenRC, X11, KDE Plasma 5.27.5. Monitor: ASUS ROG Swift PG329Q @165Hz connected via DP. Video Card: Gigabyte RTX 3700.

nvidia-bug-report.log.gz

valvatorres commented 1 year ago

I posted elsewhere about this problem and was referred to this thread to report so here it goes:

This latest nvidia driver on linux causes a flickering black jagged line to appear on the screen whenever I use any gpu accelerated program. My relevant info: linux 6.3.8.arch1-1 nvidia 535.54.03-2 nvidia-utils 535.54.03-1 RTX 3080 Monitor: Dell AW3420DW 120hz w/ gsync on Hyprland (a wayland compositor)

I reverted the nvidia drivers to 530.41.03 and the flickering is gone now.

jarrard commented 1 year ago

KDE Plasma 5.27.5.

The flickering is likely also a Plasma update issue that caused the NVIDIA bug to suddenly be noticed. I wonder if rolling back a couple plasma releases would fix the subtle flicker issue.

Anyway I'm still trying to get 535 and Plasma WL working together. There is also a X11 to WL Plasma bug that causes WL logins to kick back to the login manager, THAT isn't the >60hz bug NVIDIA, that bug causes system lock-up more or less needing a hard reset.

aritger commented 1 year ago

Screen flicker is a fairly generic symptom, and there are no doubt different bugs discussed here, particularly if the affected driver versions are different. I don't know for certain if the originally reported issue here is the same as what was reported on Windows. Due to release schedule differences, some times it can take a release or two for some of those hotfixes to propagate from Windows to Linux, unfortunately.

@birdie-github: it is curious that the regression point for you is the same as when we enabled support for driving a single display with multiple heads. As an experiment, you could try setting:

Option "ModeValidation" "MaxOneHardwareHead"

though, I'm not optimistic that will impact your case. It would still be a useful data point.

Another experiment would be, in nvidia-settings, to change the PowerMizer "Preferred Mode" to "Prefer Maximum Performance".

It would also be interesting to disable your CompositionPipeline options in the X configuration file. That will of course result in tearing, but it would be a helpful data point to know if that alters the current flickering you see.

birdie-github commented 1 year ago

@aritger

it is curious that the regression point for you is the same as when we enabled support for driving a single display with multiple heads. As an experiment, you could try setting:

Not sure if it's relevant in my case considering I've got a single monitor.

in nvidia-settings, to change the PowerMizer "Preferred Mode" to "Prefer Maximum Performance".

Will try it later.

In the past it was possible to switch NVIDIA drivers on the fly without rebooting, now with KMS it requires reboots which totally sucks. And then it looks like recent NVIDIA drivers have become somewhat persistent? I don't know how to describe it but again in the past restarting X.org resulted in the driver resetting whatever settings it had, nowadays it's not enough. Oh, and there's firmware involved. Darn, Linux has become so much worse in some regards than Windows recently.

There's a way to unload a KMS driver but I'm not sure it will work: https://nouveau.freedesktop.org/KernelModeSetting.html - that's for nouveau but should work for other KMS drivers:

#!/bin/bash

echo 0 > /sys/class/vtconsole/vtcon1/bind
rmmod nouveau
/etc/init.d/consolefont restart
rmmod ttm
rmmod drm_kms_helper
rmmod drm
birdie-github commented 1 year ago

Again this bug report is for the following specific case:

  1. A single monitor
  2. GTX 1660 Ti
  3. 530.41.03 drivers are totally fine
  4. I'm using KMS (options nvidia-drm modeset=1)
  5. I'm not using compositing or anything fancy - I've got a plain XFCE X.org session.

If people who experience flickering had it with drivers 530.41.03 as well, you definitely want to file a separate bug report.

CrossroadInTheVoid commented 1 year ago

I'm not using compositing or anything fancy - I've got a plain XFCE X.org session.

That means you are using compositing.

birdie-github commented 1 year ago

That means you are using compositing.

No, I'm not. Compositing is an optional feature of XFWM4. I have it disabled.

no compositing

AlexGoinsNV commented 1 year ago

Hello @birdie-github ,

I've had this issue only with 535.43.02 and 535.54.03 drivers.

I've never had it with older drivers.

Could you confirm that you tried with 525.89.02 with and without OpenRM? If not, it would be worth trying both of those cases. I have some suspicion that although 530 is "newer", the regression could be present in 525.89.02 when running without OpenRM.

Thanks!

birdie-github commented 1 year ago

@AlexGoinsNV

What's "OpenRM"? :)

AlexGoinsNV commented 1 year ago

@birdie-github

What's "OpenRM"? :)

Sorry, I'm referring to the open source version of nvidia.ko, explained here:

https://us.download.nvidia.com/XFree86/Linux-x86_64/525.89.02/README/kernel_open.html

There are some differences in behavior between that and the proprietary version that could affect reproducibility. Your original bug report indicates that you aren't using it, so I mainly just want to confirm that you tried with 525.89.02; the fact that it doesn't reproduce on 530.41.03 unfortunately doesn't rule it out.