NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
14.19k stars 1.18k forks source link

[REGRESSION] [535.54.03] The entire screen is frequently flickering #511

Closed birdie-github closed 8 months ago

birdie-github commented 1 year ago

NVIDIA Open GPU Kernel Modules Version

535.43.02

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Fedora 38

Kernel Release

6.3.5

Hardware: GPU

NVIDIA GeForce GTX 1660 Ti

Describe the bug

The screen is constantly flickering, no matter what applications are running.

In Firefox it's happening every few seconds. In other "simple" applications it's less frequent.

To Reproduce

Install.

Bug Incidence

All the time

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

This is a regression.

I've reverted to 530.41.03 and it's all good.

Windows users seem to be affected as well. Could be a code change which affects both drivers.

birdie-github commented 1 year ago

This bug report is strictly about the closed/proprietary driver. Sorry if I didn't make it clear earlier. Linux NVIDIA forums are horrible in terms of bug reporting (I did it, and no one paid any attention to the bug report), so I took the issue here.

I don't remember if I ever ran 525.89.02 drivers, should I try them? Again, I have no issues with 530.41.03 drivers which I have installed now.

It would be great if you released whatever display fixes you've made to your Windows driver ASAP, even as a beta. It's going to be more productive than chasing old bugs which might have already been fixed.

LoipesMas commented 1 year ago

@AlexGoinsNV I couldn't downgrade to 525.89.02, but I'm pretty sure I was using them when they were the current release (the proprietary version) and I'm pretty sure I didn't have any issues.

For me, the issues went from 0 to 100 after first reboot after upgrade to 535.54.03 (proprietary version). Then I switched to open version, but the issue is the same.

AlexGoinsNV commented 1 year ago

@birdie-github

This bug report is strictly about the closed/proprietary driver.

This GitHub repository tracks the open source driver, but that's fine, I just wanted to confirm since it could have an effect on this bug.

I don't remember if I ever ran 525.89.02 drivers, should I try them? Again, I have no issues with 530.41.03 drivers which I have installed now.

Yes, please do. Depending on the root cause, I have some reason to believe that 525.89.02 may exhibit the issue despite 530.41.03 being fine.

It would be great if you released whatever display fixes you've made to your Windows driver ASAP, even as a beta. It's going to be more productive than chasing old bugs which might have already been fixed.

I looked into the Windows bug that you mentioned:

When using multiple monitors which support adaptive sync, users may see random flicker on certain displays when G-SYNC is enabled after updating to driver 535.98 [4138119]

Although the symptoms sound similar, the original bug and the fix are Windows-specific and do not impact the Linux driver.

LoipesMas commented 1 year ago

@aritger

As an experiment, you could try setting:

Option "ModeValidation" "MaxOneHardwareHead"

I tried that and it didn't fix it. Maybe it made it slightly better, but maybe not, can't really tell.

Another experiment would be, in nvidia-settings, to change the PowerMizer "Preferred Mode" to "Prefer Maximum Performance".

It seemed to help for a few minutes, but then it got bad again, so maybe it didn't actually do anything.

The only thing that actually helps is making the gpu do stuff. If I run glxgears -swapinterval 0 (even in the background), everything is fine: no flickers, no "blackouts", no artifacts, even with both displays at max refresh rate, X11 or Wayland.

dbrhks490 commented 1 year ago

I've already tried past Sunday the following setting : Option "ModeValidation" "MaxOneHardwareHead" It was added under the display section of my xorg file. Sadly, no change and flicker still occur. Maybe the flickering is a little bit reduced with this option but i can be wrong and it's can be a placebo. Maybe someone can test further longer.

Set "prefer maximum performance" mode help to greatly reduce flickering since the pstates doesn't switch anymore between P0 and P4. But now, the GPU stuck his memory clock at maximum frequency and consume 40w (66w in dual monitor). In adaptive mode, it only consume 12w (and 19w in dual monitor). Moreover, the GPU heat very fast. So no, it's not an acceptable solution for me.

dbrhks490 commented 1 year ago

Moreover, I've tried the following settings :

Sadly, all failed and flickering still occur.

I've not found a viable solution for using my computer nowadays. Even at 60hz, flickering occur. Downgrade to the 530 new feature branch or 525 production branch is the solution but some games I'm playing crash with these drivers (ACValhalla and RE4).

You're saying that it could take 2 or 3 major revision before a potential fix come from the Windows to the Linux driver. Since news drivers are releases approximately every 3 month, this means that a fix could possibly be included in the 550 release ! We will need to wait the end of the first quarter of 2024.

I'm sincerely hope that a fix will be found before the 550 release. I' will continue to try some things when possible and report it here.

aritger commented 1 year ago

You're saying that it could take 2 or 3 major revision before a potential fix come from the Windows to the Linux driver.

If you're referring to this from me:

Due to release schedule differences, some times it can take a release or two for some of those hotfixes to propagate from Windows to Linux, unfortunately.

I meant individual release builds within a release branch, not entirely separate release branches. I.e., it may take a few weeks or so; not many months.

Anyway, to be clear: none of these suggested experiments were suggested as /solutions/. They are experiments to help identify the cause, so that we can focus our efforts on identifying a root cause and fix.

From the results so far (thanks) it sounds like the problem is that GPU clocks are not being raised high enough to satisfy the display.

So, I think the next question is: (a) Did the clock requirements of display increase to cause the regression? (b) Or, are the clocks not being raised as high as they were before?

It would help to identify:

birdie-github commented 1 year ago
  • For a driver version that does NOT show the problem, what clocks are used?

530.41.03, 1660 Ti, DP 1.4, 2560x1440, 144Hz, FreeSync/Gsync compatible, no flickering with:

$ nvidia-smi dmon
# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec   mclk   pclk 
# Idx      W      C      C      %      %      %      %    MHz    MHz 
    0      8     35      -      1     15      0      0    405    300 
    0      8     35      -      5     16      0      0    405    300 
    0      8     35      -      3     15      0      0    405    300 
    0      8     35      -      3     16      0      0    405    300 

nvidia-smi -a | grep -i mhz 
        Graphics                          : 300 MHz
        SM                                : 300 MHz
        Memory                            : 405 MHz
        Video                             : 540 MHz
        Graphics                          : 2400 MHz
        SM                                : 2400 MHz
        Memory                            : 6001 MHz
        Video                             : 1950 MHz

Sorry I don't have the stamina to test anything else right now - switching drivers is quite a tedious process when you're running your custom kernel.

And it can't be done without rebooting: the VT bind trick above apparently doesn't work, as soon as I do echo 0 > /sys/class/vtconsole/vtcon1/bind all my virtual consoles are dead.

sentakuhm commented 1 year ago

Driver 535.54.03, RTX 2060, 1920x1080, 144Hz, Adaptive Sync, with flickering:

ζ nvidia-smi dmon
# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz
    0     29     45      -     4      1      0      0      0      0   7000   1365
    0     10     44      -     1      1      0      0      0      0    810    405
    0      9     43      -     2      4      0      0      0      0    405    405
    0      8     43      -     3      7      0      0      0      0    405    315
    0      8     43      -     2      8      0      0      0      0    405    315
    0      8     43      -     2      8      0      0      0      0    405    300
    0      8     43      -     3      8      0      0      0      0    405    300
    0      8     43      -     2      8      0      0      0      0    405    300
    0      8     43      -     2      8      0      0      0      0    405    300
    0      8     43      -     2      8      0      0      0      0    405    300
    0      8     43      -     2      8      0      0      0      0    405    300
    0      7     43      -     2      8      0      0      0      0    405    300
    0      8     43      -     1      8      0      0      0      0    405    300
    0      7     43      -     1      8      0      0      0      0    405    300
    0      7     43      -     1      8      0      0      0      0    405    300
    0      7     43      -     2      8      0      0      0      0    405    300
    0      7     43      -     2      8      0      0      0      0    405    300
    0      8     43      -     2      8      0      0      0      0    405    315
    0      8     43      -     5      9      0      0      0      0    405    300
    0      8     43      -     2      8      0      0      0      0    405    300
    0     13     43      -     3      8      0      0      0      0    405    555
    0     16     43      -    32     16      0      0      0      0    405    555
    0     29     44      -    25     19      0      0      0      0   7000   1530
    0     29     45      -    10      1      0      0      0      0   7000   1530
    0     12     43      -     0      0      0      0      0      0    810    405
    0     11     43      -     2      4      0      0      0      0    810    315
ζ nvidia-smi -a | grep -i mhz
        Graphics                          : 510 MHz
        SM                                : 510 MHz
        Memory                            : 405 MHz
        Video                             : 540 MHz
        Graphics                          : 2130 MHz
        SM                                : 2130 MHz
        Memory                            : 7001 MHz
        Video                             : 1950 MHz
LoipesMas commented 1 year ago

RTX 2060, 3440x1440@120Hz, VRR disabled

530.41.03 -- no issues

``` ❯ nvidia-smi dmon # gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk # Idx W C C % % % % % % MHz MHz 0 16 41 - 6 8 0 0 0 0 810 435 0 14 41 - 8 10 0 0 0 0 405 435 0 36 42 - 24 25 0 0 0 0 7000 1320 0 34 42 - 13 3 0 0 0 0 7000 1320 0 19 41 - 2 1 0 0 0 0 810 435 0 16 41 - 3 7 0 0 0 0 810 435 0 15 41 - 7 11 0 0 0 0 405 330 0 27 42 - 28 25 0 0 0 0 7000 1320 0 35 42 - 22 20 0 0 0 0 7000 1320 0 34 42 - 3 3 0 0 0 0 5000 465 0 28 42 - 16 7 0 0 0 0 5000 435 0 32 42 - 12 5 0 0 0 0 5000 435 0 34 42 - 10 4 0 0 0 0 7000 1365 0 22 42 - 5 3 0 0 0 0 810 750 0 18 42 - 2 5 0 0 0 0 810 435 0 16 41 - 19 18 0 0 0 0 810 570 0 20 42 - 13 15 0 0 0 0 810 780 0 34 42 - 26 26 0 0 0 0 7000 1410 0 35 42 - 22 6 0 0 0 0 7000 1410 0 19 42 - 4 2 0 0 0 0 810 435 0 16 41 - 18 14 0 0 0 0 810 465 0 14 41 - 29 19 0 0 0 0 405 585 0 35 42 - 18 18 0 0 0 0 7000 1350 0 35 42 - 33 6 0 0 0 0 7000 1350 0 29 42 - 20 5 0 0 0 0 5000 795 0 16 42 - 5 2 0 0 0 0 810 540 0 16 41 - 13 10 0 0 0 0 810 435 0 14 41 - 7 9 0 0 0 0 405 435 0 14 41 - 10 16 0 0 0 0 405 435 0 15 41 - 10 18 0 0 0 0 405 435 0 14 41 - 15 19 0 0 0 0 405 435 0 14 41 - 9 17 0 0 0 0 405 435 0 14 41 - 10 18 0 0 0 0 405 435 0 14 41 - 11 18 0 0 0 0 405 435 0 39 43 - 34 15 0 0 0 0 7000 1365 0 40 43 - 11 5 0 0 0 0 7000 1365 0 33 42 - 10 4 0 0 0 0 5000 600 0 19 42 - 8 6 0 0 0 0 810 435 0 16 42 - 5 9 0 0 0 0 810 435 0 14 42 - 7 9 0 0 0 0 405 435 0 15 41 - 10 18 0 0 0 0 405 330 0 14 42 - 16 18 0 0 0 0 405 330 0 14 41 - 17 18 0 0 0 0 405 330 0 14 42 - 16 18 0 0 0 0 405 330 # gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk # Idx W C C % % % % % % MHz MHz 0 15 42 - 16 18 0 0 0 0 405 330 0 15 41 - 16 18 0 0 0 0 405 330 0 14 41 - 20 22 0 0 0 0 405 435 0 15 41 - 10 18 0 0 0 0 405 435 0 16 42 - 14 21 0 0 0 0 405 435 0 14 41 - 15 21 0 0 0 0 405 435 0 20 42 - 11 19 0 0 0 0 405 630 0 17 41 - 28 28 0 0 0 0 810 1080 0 22 42 - 7 11 0 0 0 0 810 915 0 17 42 - 22 22 0 0 0 0 810 1005 0 16 41 - 7 11 0 0 0 0 405 630 0 17 42 - 24 23 0 0 0 0 810 1065 0 25 42 - 14 16 0 0 0 0 7000 1575 0 36 43 - 21 19 0 0 0 0 7000 1575 0 19 42 - 2 1 0 0 0 0 810 780 0 25 42 - 3 5 0 0 0 0 810 735 0 23 42 - 21 23 0 0 0 0 7000 1365 0 36 43 - 9 9 0 0 0 0 7000 1485 0 35 42 - 5 2 0 0 0 0 7000 1365 0 35 43 - 2 1 0 0 0 0 7000 1365 0 22 42 - 1 1 0 0 0 0 810 435 0 21 42 - 21 18 0 0 0 0 810 705 0 17 42 - 23 19 0 0 0 0 810 705 0 22 42 - 16 16 0 0 0 0 810 750 0 21 42 - 26 22 0 0 0 0 810 870 0 17 42 - 15 15 0 0 0 0 810 990 0 34 42 - 15 17 0 0 0 0 7000 1185 0 37 43 - 18 13 0 0 0 0 7000 1185 0 22 42 - 19 16 0 0 0 0 810 630 0 17 42 - 23 22 0 0 0 0 810 735 0 23 42 - 33 28 0 0 0 0 810 810 0 22 42 - 26 24 0 0 0 0 810 960 0 17 42 - 23 23 0 0 0 0 810 1005 0 17 42 - 9 12 0 0 0 0 810 1005 0 14 42 - 8 14 0 0 0 0 405 435 0 14 42 - 12 20 0 0 0 0 405 435 0 17 42 - 11 19 0 0 0 0 405 435 0 19 42 - 15 21 0 0 0 0 405 435 0 32 42 - 32 27 0 0 0 0 7000 1170 0 34 42 - 12 4 0 0 0 0 7000 1170 0 24 42 - 5 3 0 0 0 0 810 810 0 23 42 - 21 21 0 0 0 0 810 1035 0 43 43 - 38 32 0 0 0 0 7000 1710 0 41 43 - 12 5 0 0 0 0 7000 1710 # gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk # Idx W C C % % % % % % MHz MHz 0 33 43 - 3 4 0 0 0 0 810 630 0 19 42 - 32 25 0 0 0 0 810 1110 0 41 43 - 25 14 0 0 0 0 7000 1500 0 43 43 - 13 5 0 0 0 0 7000 1500 0 29 43 - 10 3 0 0 0 0 5000 825 0 19 42 - 1 4 0 0 0 0 810 525 0 15 42 - 4 8 0 0 0 0 405 435 0 15 42 - 6 14 0 0 0 0 405 435 0 16 42 - 6 16 0 0 0 0 405 330 0 16 42 - 31 22 0 0 0 0 810 780 0 22 42 - 10 12 0 0 0 0 810 1050 0 37 43 - 35 25 0 0 0 0 7000 1290 0 34 43 - 6 4 0 0 0 0 5000 450 0 18 42 - 2 2 0 0 0 0 810 435 0 17 42 - 22 18 0 0 0 0 810 645 ```

535.54.03 -- flickering

I've annotated when flickering happened ``` ❯ nvidia-smi dmon # gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk # Idx W C C % % % % % % MHz MHz 0 16 41 - 6 8 0 0 0 0 810 435 0 14 41 - 8 10 0 0 0 0 405 435 0 36 42 - 24 25 0 0 0 0 7000 1320 0 34 42 - 13 3 0 0 0 0 7000 1320 0 19 41 - 2 1 0 0 0 0 810 435 0 16 41 - 3 7 0 0 0 0 810 435 0 15 41 - 7 11 0 0 0 0 405 330 0 27 42 - 28 25 0 0 0 0 7000 1320 0 35 42 - 22 20 0 0 0 0 7000 1320 0 34 42 - 3 3 0 0 0 0 5000 465 0 28 42 - 16 7 0 0 0 0 5000 435 0 32 42 - 12 5 0 0 0 0 5000 435 0 34 42 - 10 4 0 0 0 0 7000 1365 0 22 42 - 5 3 0 0 0 0 810 750 0 18 42 - 2 5 0 0 0 0 810 435 0 16 41 - 19 18 0 0 0 0 810 570 0 20 42 - 13 15 0 0 0 0 810 780 0 34 42 - 26 26 0 0 0 0 7000 1410 0 35 42 - 22 6 0 0 0 0 7000 1410 0 19 42 - 4 2 0 0 0 0 810 435 0 16 41 - 18 14 0 0 0 0 810 465 0 14 41 - 29 19 0 0 0 0 405 585 0 35 42 - 18 18 0 0 0 0 7000 1350 0 35 42 - 33 6 0 0 0 0 7000 1350 0 29 42 - 20 5 0 0 0 0 5000 795 0 16 42 - 5 2 0 0 0 0 810 540 0 16 41 - 13 10 0 0 0 0 810 435 0 14 41 - 7 9 0 0 0 0 405 435 0 14 41 - 10 16 0 0 0 0 405 435 0 15 41 - 10 18 0 0 0 0 405 435 0 14 41 - 15 19 0 0 0 0 405 435 0 14 41 - 9 17 0 0 0 0 405 435 0 14 41 - 10 18 0 0 0 0 405 435 0 14 41 - 11 18 0 0 0 0 405 435 0 39 43 - 34 15 0 0 0 0 7000 1365 0 40 43 - 11 5 0 0 0 0 7000 1365 0 33 42 - 10 4 0 0 0 0 5000 600 0 19 42 - 8 6 0 0 0 0 810 435 0 16 42 - 5 9 0 0 0 0 810 435 0 14 42 - 7 9 0 0 0 0 405 435 0 15 41 - 10 18 0 0 0 0 405 330 0 14 42 - 16 18 0 0 0 0 405 330 0 14 41 - 17 18 0 0 0 0 405 330 0 14 42 - 16 18 0 0 0 0 405 330 # gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk # Idx W C C % % % % % % MHz MHz 0 15 42 - 16 18 0 0 0 0 405 330 0 15 41 - 16 18 0 0 0 0 405 330 0 14 41 - 20 22 0 0 0 0 405 435 0 15 41 - 10 18 0 0 0 0 405 435 0 16 42 - 14 21 0 0 0 0 405 435 0 14 41 - 15 21 0 0 0 0 405 435 0 20 42 - 11 19 0 0 0 0 405 630 0 17 41 - 28 28 0 0 0 0 810 1080 0 22 42 - 7 11 0 0 0 0 810 915 0 17 42 - 22 22 0 0 0 0 810 1005 0 16 41 - 7 11 0 0 0 0 405 630< 0 17 42 - 24 23 0 0 0 0 810 1065< flicker about here 0 25 42 - 14 16 0 0 0 0 7000 1575< 0 36 43 - 21 19 0 0 0 0 7000 1575 0 19 42 - 2 1 0 0 0 0 810 780 0 25 42 - 3 5 0 0 0 0 810 735 0 23 42 - 21 23 0 0 0 0 7000 1365 0 36 43 - 9 9 0 0 0 0 7000 1485 0 35 42 - 5 2 0 0 0 0 7000 1365 0 35 43 - 2 1 0 0 0 0 7000 1365 0 22 42 - 1 1 0 0 0 0 810 435 0 21 42 - 21 18 0 0 0 0 810 705 0 17 42 - 23 19 0 0 0 0 810 705 0 22 42 - 16 16 0 0 0 0 810 750 0 21 42 - 26 22 0 0 0 0 810 870 0 17 42 - 15 15 0 0 0 0 810 990 0 34 42 - 15 17 0 0 0 0 7000 1185 0 37 43 - 18 13 0 0 0 0 7000 1185 0 22 42 - 19 16 0 0 0 0 810 630 0 17 42 - 23 22 0 0 0 0 810 735 0 23 42 - 33 28 0 0 0 0 810 810 0 22 42 - 26 24 0 0 0 0 810 960 0 17 42 - 23 23 0 0 0 0 810 1005 0 17 42 - 9 12 0 0 0 0 810 1005 0 14 42 - 8 14 0 0 0 0 405 435 0 14 42 - 12 20 0 0 0 0 405 435 0 17 42 - 11 19 0 0 0 0 405 435 0 19 42 - 15 21 0 0 0 0 405 435 0 32 42 - 32 27 0 0 0 0 7000 1170 0 34 42 - 12 4 0 0 0 0 7000 1170 0 24 42 - 5 3 0 0 0 0 810 810 0 23 42 - 21 21 0 0 0 0 810 1035 0 43 43 - 38 32 0 0 0 0 7000 1710 0 41 43 - 12 5 0 0 0 0 7000 1710 # gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk # Idx W C C % % % % % % MHz MHz 0 33 43 - 3 4 0 0 0 0 810 630 0 19 42 - 32 25 0 0 0 0 810 1110 0 41 43 - 25 14 0 0 0 0 7000 1500 0 43 43 - 13 5 0 0 0 0 7000 1500 0 29 43 - 10 3 0 0 0 0 5000 825 0 19 42 - 1 4 0 0 0 0 810 525 0 15 42 - 4 8 0 0 0 0 405 435 0 15 42 - 6 14 0 0 0 0 405 435 0 16 42 - 6 16 0 0 0 0 405 330 0 16 42 - 31 22 0 0 0 0 810 780 0 22 42 - 10 12 0 0 0 0 810 1050 0 37 43 - 35 25 0 0 0 0 7000 1290 0 34 43 - 6 4 0 0 0 0 5000 450 0 18 42 - 2 2 0 0 0 0 810 435 0 17 42 - 22 18 0 0 0 0 810 645 ```

amrit1711 commented 1 year ago

We have filed a bug 4164132 internally for tracking purpose. I do have local repro now but it is not 100% consistent on my setup. However this will help us to debug issue further and will keep updated on the same.

LoipesMas commented 1 year ago

Also I feel like there are three types of "symptoms" that (at least for me) are present in 535 but no in 530.

  1. "Blackouts" - as if the display was disconnected and connected again. Only happens with both displays on and with high refresh rate, but only the main display turns off. And seems to happen when changing P-states (I think mainly between P0<->P5). (And it seems that on 530 my GPU is always at P0, so that might be a clue).
  2. Partial flickers - top ~third of the display goes black for a frame or so. I can't see any obvious correlations with other things, so it's hard to reliably reproduce (using Firefox it may happen every few seconds, but also may not happen at all for a minute or two)
  3. Corrupted redraws with VRR - if I turn on VRR, when parts of the screen get redrawn, most of the time about 15% of the redrawn area is black. This one isn't confined to the top of the screen and is much more visible. But this only happens with VRR on.

(I also get some artifacts - parts of screen not being updated or being updated incorrectly - but this seems even more unrelated and it might be a Firefox issue (but it's on 535 driver only))

Do you think those three "symptoms" show the same underlying issue? Or are they likely unrelated to each other?

dbrhks490 commented 1 year ago

I do have local repro now but it is not 100% consistent on my setup.

You can induce flickering more frequently by :

With my GPU, an RTX 2080 and a monitor with a refresh rate of 144hz, after 6 or 7 instances open, the pstates start switching between P0 and P1 constantly. It is at this moment that the flickering occur very often.

My clocks at idle are :

PowerDraw

Whether with the 525 or 530 or 535, the clocks are the same, even when their change when pstates start switching.

Also I feel like there are three types of "symptoms"

Yes, you are right from what I have read here and elsewhere. But do you experience all 3 depending on the situation? About me, whether with 1 or 2 screens, I am in case number 2, a flicker on the top of the screen. Not a complete black screen. Whatever the situation, whatever the frequency, I am always in case 2.

amrit1711 commented 1 year ago

Thanks @dbrhks490 I will try the recommended steps for consistent repro.

LoipesMas commented 1 year ago

You can induce flickering more frequently by :

* Opening `nvidia-settings` on the "powermizer" tab.

* Open multiple instances of `vkcube`

I tried this and in my case it didn't seem any more "effective" than normal usage

But do you experience all 3 depending on the situation?

Yes, depending on how I set up my displays I can be in either of the 3 cases.

GrzegorzKozub commented 1 year ago

Reading you guys are talking about VRR I realized one of my monitors, the flickering one, supports FreeSync. I found that disabling FreeSync helps my scenario (Wayland only flickering). I updated my original https://github.com/NVIDIA/open-gpu-kernel-modules/issues/511#issuecomment-1596567416 now with this find and a video explaining the kind of flicker I get.

LoipesMas commented 1 year ago

I can confirm, disabling FreeSync on the display does fix that issue. I was greeted with 2 minor flickers, but after that no issues: neither case 1 "blackouts" or case 2 "partial flickers". But it also causes the GPU to stay in P0 all the time, so that's in-line with previous observations. With FreeSync On (even with VRR disabled on the WM/Compositor) it switches between P0, P5 and P8, and that's when the issues occur.

Looking at the comments here again, it seems that @thesword53 had it figured out from the start.

errantmind commented 1 year ago

I'm also having this issue after updating from 525 to 535 (on Arch Linux). I have a 2080 TI and no compositor with X11. I can confirm it happens when the pstate changes as setting the PowerMizer to 1 with nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" prevents the flickering.

My primary monitor is 240hz over display port, my secondary monitor is 60hz over HDMI. The flicker happens on both monitors. I haven't tested whether or not it stops when only using one monitor. A fix would be appreciated.

birdie-github commented 1 year ago

Yesterday after connecting and disconnecting a 4K TV via HDMI, the upper one third of my monitor connected via DP started flickering with 530.41.03 drivers :-( After a few reboots, a complete power off and booting into Windows for an hour, the bug disappeared. It's all quite unnerving. Windows is running 531.68 drivers.

LoipesMas commented 1 year ago

@birdie-github have you tried 535 with FreeSync/GSync disabled on the display?

taleteller commented 1 year ago

Same Issue here. I upgraded to 535 because I have to use Wayland and hoped reducing the legion of bugs having nvidia/wayland/kde plasma. My scenario:

Monitor 1: 1920x1200@60Hz 100% scaling Monitor 2: 3840x2160@60Hz 150% scaling GPU: GTX 2070

With version 535 driver I see flickering on the top on any screen every few seconds. Never at the same time, but on both screens. After downgrading to the latest 525 the issue is gone.

WannaBeOCer commented 1 year ago

I'm not having the issue on a certified G-Sync compatible monitor: LG 27GN950. I suggest listing your monitor and confirm if it's a G-Sync compatible certified monitor.

G-Sync compatible uses VESA's Adaptive-Sync/HDMI VRR protocols. FreeSync is an AMD technology that only works on AMD GPUs which also uses VESA's Adaptive-Sync/HDMI VRR protocols.

errantmind commented 1 year ago

I don't think this is related to G-Sync / FreeSync. I'm having an issue on my 240hz XL2546 (and my secondary monitor XL2411 running at 60hz) which doesn't support either one. I mentioned my other details above.

notfood commented 1 year ago

My main monitor doesn't have G-Sync or FreeSync support yet it flickers the top.

thesword53 commented 1 year ago

Do you have 2+ monitors?

I think it's a problem with VRR and dual monitor setup. I also have the issue on Windows with the 530 and 535 drivers and Linux with the 535 drivers. The 530 Linux drivers are not affected because of another bug sticking GPU at higher power state on multi-monitor and >60Hz setup. I also noticed flickering is caused by VRR/G-Sync screen frequency stuttering and happens when GPU is switching power state.

Looking at the comments here again, it seems that @thesword53 had it figured out from the start.

The issue I explained before is another bug. More details here: https://forums.developer.nvidia.com/t/monitors-literally-stutter-when-vrr-g-sync-is-enabled/256836/2

binarysafari commented 1 year ago

Updated today to 535.54.03. Secondary screen turns on and off constantly + flickering. Disabling G-Sync for both screens works for me.

Fedora 37 (6.3.8-100.fc37) 535.54.03-2

NVIDIA GeForce RTX 2070 GBT M32Q @144Hz (Primary) LG 32 32GN600-B @144hz

aritger commented 1 year ago

I'm not sure if it is related, but there was a clock management change in the regression window that we're a little suspicious of.

If anyone who is seeing this problem would like to perform an experiment:

Does the problem still reproduce with that patch applied?

0001-test-revert-NV0073_CTRL_CMD_SYSTEM_CONFIG_VRR_PSTATE.patch.txt

z1atk0 commented 1 year ago

If anyone who is seeing this problem would like to perform an experiment: [...]

  1. I installed the new drivers without giving any specific -m=kernel/kernel-open option. Which one is the default, which version is now active on my system?
  2. My monitors (AOC 24G2SPU) claim to be GSYNC compatible, but although I have set "GSYNC: ON", it says "SYNC Technology: 60Hz", so apparently GSYNC is not enabled. Both monitors are connected via HDMI.
  3. Flicker happens only on the upper few centimeters on both monitors (so they do not completely turn off as they would do on signal loss or input source change).

Given 1.-3. above, would it make sense to perform the suggested experiment?

tinklern commented 1 year ago

Just chiming in that I am also experiencing this:

I'm on dual monitors, rtx2080, no gsync/adaptivesync - I see flickering across the top quarter of the screen but infrequently (about once a minute). It usually seems to happen when scrolling content.

NVIDIA-SMI 535.54.03 Driver Version: 535.54.03

Started since I upgraded from the 525 driver.

jessicamaybe commented 1 year ago

I'm not sure if it is related, but there was a clock management change in the regression window that we're a little suspicious of.

If anyone who is seeing this problem would like to perform an experiment:

* Install the 535.54.03 NVIDIA driver, enabling open-gpu-kernel-modules (e.g., install from .run file with `-m=kernel-open`).

* Confirm you can reproduce the flickering problem.

* git clone open-gpu-kernel-modules, and check out the 535.54.03 tag.

* Apply the attached patch (`git am 0001-test-revert-NV0073_CTRL_CMD_SYSTEM_CONFIG_VRR_PSTATE.patch.txt`); rebuild and install the open-gpu-kernel-modules.

Does the problem still reproduce with that patch applied?

0001-test-revert-NV0073_CTRL_CMD_SYSTEM_CONFIG_VRR_PSTATE.patch.txt

I'm experiencing the top of my screen flickering after updating to 535.54.03. I tried the Nvidia Open driver and it still happens, and I have also tested with the patch and it is also still happening for me.

EDIT: If I revert to 530 it stops happening, ymmv I suppose

Arch Linux NVIDIA GeForce RTX 2070 KDE Plasma 5.27.6 on Wayland

Running dual monitors: Dell G2422HS Dell S2721DGF

aritger commented 1 year ago

Thanks for testing. That is a useful data point. We're investigating.

deflock commented 1 year ago

May it be mostly Turing-related issue? I'm experiencing this on RTX2060 on Wayland since ±515.x, and since then I have to use Nouveau instead.

dbrhks490 commented 1 year ago

Many users affected by this issue have a Turing GPU yes. There are threads on the developer forum who started talking about this problem since the release of the 530 drivers. Are you really sure this happen for you since the 515 ?

I've tried the patch and sadly, like z1atk0 and jessicamaybe, the flickering still occur. Absolutely no change with it.

Applying the patch:

git apply --stat 0001-test-revert-NV0073_CTRL_CMD_SYSTEM_CONFIG_VRR_PSTATE.patch

Give this result:

src/nvidia-modeset/src/nvkms-vrr.c | 70 +++++++----------------------------- 1 file changed, 13 insertions(+), 57 deletions(-)

So, I suppose it was correctly applied.

Just writing this post my monitor flickered 12 times (144hz + Picom GLX) Since I'm currently playing games that don't work with drivers prior to 535, I have to go back to Windows. Downgrading is not an option. Of course, if other tests must be done, I will not hesitate to come back to Linux.

GPU : RTX 2080 Monitor : Acer xb271hubmiprz

sentakuhm commented 1 year ago

@deflock not only Turing all RTX series affected read up comments you see 3070 and 4090 and more, wayland its another problem, you can try all drivers from 515-530 no flickering on X11.

jarrard commented 1 year ago

I'm not using the open-driver component atm and did a new install of EndeavourOS with a 4090 and 535 driver and don't get flickering atm (touch wood) in XFCE or Plasma X11. I have a 4k 120hz VRR primary and 165hz VRR Secondary screen.

The primary screen does GSYNC/VRR/FS-Pro etc... its a LG C1.

Doubt I can login to Plasma Wayland session still due to the hz limit bug still being around, but at least no flicker atm. Not sure what triggered it earlier because I have AllowFlipping enabled this time.

deflock commented 1 year ago

Are you really sure this happen for you since the 515 ?

Is this ticket/issue for X only? I remember nvidia added gbm in 495.x and I'm awaiting for correct Wayland support in their driver since then :) There are a lot of different flickering issues in driver. I have flickerings in the top part of the screen. For me it looks like something is broken between iGPU and dGPU buffers.

birdie-github commented 1 year ago

Is this ticket/issue for X only?

Nope, it affects Wayland users as well.

May it be mostly Turing-related issue? I'm experiencing this on RTX2060 on Wayland since ±515.x, and since then I have to use Nouveau instead.

The vast majority of people in this thread are indeed Turing users, but there are isolated reports from Ampere and Lovelace users as well.

jarrard commented 1 year ago

I might have talked too soon, I got some flicker before but gone now. So perhaps also affects closed driver.

UPDATE: There is flickering issues under windows11 drivers also. Wouldn't be surprised if the driver was written by AI and as such ported to Linux as nvidia does.

My HDMI2.1 + cables is fine and high quality, and this isn't a issue in early drivers.

Monsterovich commented 1 year ago

I'm now running NVidia driver 535.54.03.

I don't have this problem with or without VRR.

GPU: RTX 3060 Ti. Display: Microstep MSI G24C4 (Via DisplayPort)

I'm lucky somehow.

kodatarule commented 1 year ago

This seems to affect proprietary driver as well, in addition I've been running into 1 monitor going blank.

https://forums.developer.nvidia.com/t/monitor-goes-blank-for-a-few-seconds-535-54-03/258481/2

jarrard commented 1 year ago

I think it could be a issue with VRR.

shashanknimje commented 1 year ago

I am also experiencing a steady rate of intermittent flickering once every 15-30 seconds using proprietary Nvidia drivers.

Nvidia Driver Version: 535.54.03

GPU: GeForce GTX 1650

Screen Resolution: 2560x1440 @ 144Hz refresh rate

OS: Arch Linux Kernel 6.1.37-1-lts

Desktop: Gnome 44.2 & GDM 44.1 with X11 (X Server Windowing System)

This wasn't an issue prior to upgrading to 535.54.03. Hope they provide a fix soon.

kelderek commented 1 year ago

Same flickering issue for me with a very similar config, also using only proprietary drivers. Also during 535 driver install I completely lose video out - monitor goes to black then to sleep. On reboot I get video but with the flickering every few minutes. The first install (via Discover) also installed oracle and lowlatency kernels, but oracle one was immediately marked as no longer needed by apt autoremove. lowlatency kernel wouldn't boot at all, removed it and the oracle kernels. Subsequent installs via driver manager still lose video out during install.

GeForce RTX 2080 Ti Founders Edition 1 monitor 1440p 144Hz connected via Displayport Kubuntu 23.04 using KDE and X11

sean-gilliam commented 1 year ago

Also during 535 driver install I completely lose video out - monitor goes to black then to sleep. On reboot I get video but with the flickering every few minutes.

Same thing happened to me as well.

mrmodolo commented 1 year ago

Same thing here!

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:    22.04
Codename:   jammy

libnvidia-cfg1-535:amd64 535.54.03-0ubuntu0.22.04.1
libnvidia-common-535 535.54.03-0ubuntu0.22.04.1
libnvidia-compute-535:amd64 535.54.03-0ubuntu0.22.04.1
libnvidia-compute-535:i386 535.54.03-0ubuntu0.22.04.1
libnvidia-decode-535:amd64 535.54.03-0ubuntu0.22.04.1
libnvidia-decode-535:i386 535.54.03-0ubuntu0.22.04.1
libnvidia-encode-535:amd64 535.54.03-0ubuntu0.22.04.1
libnvidia-encode-535:i386 535.54.03-0ubuntu0.22.04.1
libnvidia-extra-535:amd64 535.54.03-0ubuntu0.22.04.1
libnvidia-fbc1-535:amd64 535.54.03-0ubuntu0.22.04.1
libnvidia-fbc1-535:i386 535.54.03-0ubuntu0.22.04.1
libnvidia-gl-535:amd64 535.54.03-0ubuntu0.22.04.1
libnvidia-gl-535:i386 535.54.03-0ubuntu0.22.04.1
nvidia-compute-utils-535 535.54.03-0ubuntu0.22.04.1
nvidia-dkms-535 535.54.03-0ubuntu0.22.04.1
nvidia-driver-535 535.54.03-0ubuntu0.22.04.1
nvidia-firmware-535-535.54.03 535.54.03-0ubuntu0.22.04.1
nvidia-kernel-common-535 535.54.03-0ubuntu0.22.04.1
nvidia-kernel-source-535 535.54.03-0ubuntu0.22.04.1
nvidia-prime 0.8.17.1
nvidia-settings 510.47.03-0ubuntu1
nvidia-utils-535 535.54.03-0ubuntu0.22.04.1
screen-resolution-extra 0.18.2
xserver-xorg-video-nvidia-535 535.54.03-0ubuntu0.22.04.1
lspci | grep ' VGA ' | cut -d" " -f 1 | xargs -i lspci -v -s {}
03:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2060] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation TU104 [GeForce RTX 2060]
    Physical Slot: 6
    Flags: bus master, fast devsel, latency 0, IRQ 68, NUMA node 0
    Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    Memory at f0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at e000 [size=128]
    Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
xdpyinfo | awk '/dimensions/{print $2}'
3840x2160
xrandr
Screen 0: minimum 8 x 8, current 3840 x 2160, maximum 32767 x 32767
DVI-D-0 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-0 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 600mm x 340mm
   3840x2160     60.00*+  30.00  
   2560x1440     59.95  
   1920x1080     60.00    59.94  
   1600x900      60.00  
   1280x1024     60.02  
   1280x800      59.81  
   1280x720      60.00    59.94  
   1152x864      59.96  
   1024x768      60.00  
   800x600       60.32  
   720x480       59.94  
   640x480       59.94    59.93  
DP-1 disconnected (normal left inverted right x axis y axis)
birdie-github commented 1 year ago

@mrmodolo

You must have missed the part which asks for nvidia-bug-report and your exact monitor model.

mrmodolo commented 1 year ago

Thanks @birdie-github !

lspci | grep ' VGA ' | cut -d" " -f 1 | xargs -i lspci -v -s {}
03:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2060] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation TU104 [GeForce RTX 2060]
    Physical Slot: 6
    Flags: bus master, fast devsel, latency 0, IRQ 68, NUMA node 0
    Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    Memory at f0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at e000 [size=128]
    Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
xdpyinfo | awk '/dimensions/{print $2}'
3840x2160
xrandr
Screen 0: minimum 8 x 8, current 3840 x 2160, maximum 32767 x 32767
DVI-D-0 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-0 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 600mm x 340mm
   3840x2160     60.00*+  30.00  
   2560x1440     59.95  
   1920x1080     60.00    59.94  
   1600x900      60.00  
   1280x1024     60.02  
   1280x800      59.81  
   1280x720      60.00    59.94  
   1152x864      59.96  
   1024x768      60.00  
   800x600       60.32  
   720x480       59.94  
   640x480       59.94    59.93  
DP-1 disconnected (normal left inverted right x axis y axis)
kelderek commented 1 year ago

Here's my nvidia-bug-report.log.gz and my monitor is a Pixio PX277h connected via Displayport nvidia-bug-report.log.gz

NiallDoherty commented 1 year ago

Primary Monitor: MSI MAG274QRX Second monitor: Dell U2414H

The primary montor flickers black when anything on screen changes.

I apply display settings using the following at startup - /usr/bin/nvidia-settings --assign CurrentMetaMode="DP-2: 2560x1440_240 +1920+0 {ForceCompositionPipeline=Off, AllowGSYNCCompatible=On, Primary=true}, HDMI-0: nvidia-auto-select +0+180 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}" --load-config-only

The issue doesn't happen at all if I change it to AllowGSYNCCompatible=Off

nvidia-bug-report.log.gz

mrmodolo commented 1 year ago

I forgot to put the information about the monitor. I only have one monitor:

LG Electronics LG Ultra HD (3840x2160)
Signal: DisplayPort
Connection link: 4 lanes @ 5.40 Gbps
Refresh Rate: 60.00 Hz