NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
14.71k stars 1.2k forks source link

[REGRESSION] [535.54.03] The entire screen is frequently flickering #511

Closed birdie-github closed 9 months ago

birdie-github commented 1 year ago

NVIDIA Open GPU Kernel Modules Version

535.43.02

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Fedora 38

Kernel Release

6.3.5

Hardware: GPU

NVIDIA GeForce GTX 1660 Ti

Describe the bug

The screen is constantly flickering, no matter what applications are running.

In Firefox it's happening every few seconds. In other "simple" applications it's less frequent.

To Reproduce

Install.

Bug Incidence

All the time

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

This is a regression.

I've reverted to 530.41.03 and it's all good.

Windows users seem to be affected as well. Could be a code change which affects both drivers.

ottojimb commented 1 year ago

I have noticed that if you use OBS and start recording the blinking stops happening. Maybe my comment is very silly but it could give some clue...

AlexWayfer commented 1 year ago

I have noticed that if you use OBS and start recording the blinking stops happening. Maybe my comment is very silly but it could give some clue...

No, I've seen the blinking at the top of my screen even with OBS recording (maybe less often), but it's not visible in the resulting record.

kurld commented 1 year ago

RTX2070S, Arch 6.4.2, nvidia 535.54.03-7. Single 60Hz Dell screen. Changing power mode from Auto to Performance in driver settings reduces flickering to somehow acceptable level, but that's not really a valid solution.

HDMI-0 connected primary 2560x1440+0+0 (normal left inverted right x axis y axis) 597mm x 336mm 2560x1440 59.95*+

nvidia-bug-report.log.gz

thesword53 commented 1 year ago

535.86.05 driver released and all the issues are still present.

AlexGoinsNV commented 1 year ago

@thesword53

Can you try loading nvidia-modeset with the parameter 'disable_vrr_mclk_switch=1' and see if that resolves your issues?

thesword53 commented 1 year ago

Can you try loading nvidia-modeset with the parameter 'disable_vrr_mclk_switch=1' and see if that resolves your issues?

Enabling the kernel parameter solves the following issues.

but the GPU is stuck to P0 state and idle power is 58W instead of 22W. It also partially solve that issue: https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/261

AlexGoinsNV commented 1 year ago

Thanks for checking. Per the changelog:

Fixed a bug that could cause some Variable Refresh Rate (VRR) monitors to flicker by allowing the refresh rate to drop below the monitor's minimum.

we made some changes that we were hoping would help with this, but it sounds like there are still some issues remaining. disable_vrr_mclk_switch=1 disables the entire feature that allows for leveraging VRR to extend vblank on configurations where there wouldn't be a long enough vblank to change memory clocks without glitches. Something seems to still be going wrong with it on some configurations. Disabling it of course works around any issues, but as you noted, without this feature some configurations will remain pegged at P0.

I suspect that the "screen FPS stuttering" issue that you mention is an expected consequence of the feature, since extending vblank briefly can result in a lower effective refresh rate. However, if is disrupting your experience, it's possible that something might be making it more noticeable than it should be.

thesword53 commented 1 year ago

I suspect that the "screen FPS stuttering" issue that you mention is an expected consequence of the feature, since extending vblank briefly can result in a lower effective refresh rate. However, if is disrupting your experience, it's possible that something might be making it more noticeable than it should be.

The "screen FPS stuttering" issue also affects Windows and previous driver versions (530, 525). It also only affects my 2560x1440@144 VRR screen (1920x1080@75 VRR screen is not affected).

AlexGoinsNV commented 1 year ago

@thesword53

Are you sure that it happens on 530? I would expect it to happen on 525.116.04 but not 530.41.03.

thesword53 commented 1 year ago

Are you sure that it happens on 530? I would expect it to happen on 525.116.04 but not 530.41.03.

No on Linux because the GPU stuck to P0 state, but yes on Windows.

fulalas commented 1 year ago

I can confirm 535.86.05 doesn't fix the flicker issue on Linux, as seen in this video:

https://github.com/NVIDIA/open-gpu-kernel-modules/assets/27843666/0e31f2e4-58dc-4d7a-a74f-520228867f8c

I have a GTX 1650 Super + Asus VG279QM display with VRR set to 280 Hz at the moment.

  1. Reverting to 525.85.05 fixes the problem.
  2. Lowering the refresh rate to 240 Hz or less alleviate the issue a bit but I can still see some flickering here and there.
AlexGoinsNV commented 1 year ago

Thank you @thesword53 and @fulalas, those findings are consistent with my suspicions.

shashanknimje commented 1 year ago

Just wanted to give an update after upgrading to the newly released 535.86.05 drivers. The issue still persists on my GTX 1650 GPU. Albeit the frequency of the flicker has reduced.

Nvidia Driver: 535.86.05 GPU: GeForce GTX 1650 OS: Arch Linux Kernel: 6.1.38-2-lts GNOME Shell: 44.3 Windowing System: X11 Screen Resolution: 2560x1440 @ 144Hz

Edit: Spoke too soon. Now I see long flickers with the screen blacking out for nearly 3 seconds.

z1atk0 commented 1 year ago

535.86.05 doesn't fix it for me, either. Like for @shashanknimje above, the flicker doesn't happen as often anymore, though, but it still does. EDIT: for me it's only the flickering on top of the screen (across both monitors), no complete screen blanking, blackouts or signal loss.

GPU: NVIDIA TU116 (GeForce GTX 1660 Ti) Driver: 535.86.05 Monitor: 2x AOC 24G2SPU @ HDMI (=> no VRR, G-SYNC only works on DP inputs) OS: Slackware64-15.0 Kernel: 6.1.39 DE: GNOME 44 on X11 (no Wayland) Screen Resolution: 3840x1080 @ 60Hz (= 2x 1920x1080 @ TwinView)

thesword53 commented 1 year ago

I can confirm 535.86.05 doesn't fix the flicker issue on Linux, as seen in this video: nvidia-linux-flicker.mp4

I have a GTX 1650 Super + Asus VG279QM display with VRR set to 280 Hz at the moment.

1. Reverting to `525.85.05` fixes the problem.

2. Lowering the refresh rate to 240 Hz or less alleviate the issue a bit but I can still see some flickering here and there.

I didn't notice flickering like yours (I am using Plasma on Wayland) but my main screen literally randomly loses video signal for ~1s when I do some basic things that elevate GPU power sate (opening browser, loading video with NVDEC, etc...). The 535.98 Windows driver was also affected by the same bug and I have experienced it. 536.23 drivers fixed screens black outs.

Why wasn't the same patch applied to 535.86.05 Linux drivers?

akb825 commented 1 year ago

I've also been experiencing the same issue the last two or three months with the proprietary drivers, including with the latest. (535.86.05) Some key highlights based on the discussion so far:

More detailed system information:

mrmodolo commented 11 months ago

Even after upgrading to version 535.86.05-0ubuntu0.22.04.1 the behavior remains the same. I'm running 1 monitor (LG) at 3840x2160 at 60 Hz on DP Ubuntu Gnome X Server

Linux hal 5.19.0-50-generic #50-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 10 18:24:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:    22.04
Codename:   jammy
nvidia-compute-utils-535                         535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA compute utilities
nvidia-dkms-535                                  535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA DKMS package
nvidia-driver-535                                535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA driver metapackage
nvidia-firmware-535-535.86.05                    535.86.05-0ubuntu0.22.04.1                     amd64        Firmware files used by the kernel module
nvidia-kernel-common-535                         535.86.05-0ubuntu0.22.04.1                     amd64        Shared files used with the kernel module
nvidia-kernel-source-535                         535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA kernel source package
birdie-github commented 11 months ago

NVIDIA driver 535.98 is out, claims to have fixed this bug again:

• Fixed an issue which caused the following error message to appear spuriously when using SLI with the NVIDIA Open GPU Kernel Modules: • (EE) NVIDIA: Unable to disable FB size compare • Fixed a bug which prevented DKMS from registering kernel modules. • Fixed a bug which could cause the screen to flicker.

Too bad many people here seemingly have standard 60Hz monitors.

https://www.nvidia.com/download/driverResults.aspx/210317/en-us/

AlexWayfer commented 11 months ago

NVIDIA driver 535.98 is out, claims to have fixed this bug again:

Fixed a bug that could cause some Variable Refresh Rate (VRR) monitors to flicker by allowing the refresh rate to drop below the monitor's minimum.

Too bad many people here seemingly have standard 60Hz monitors.

https://www.nvidia.com/download/driverResults.aspx/210317/en-us/

Yeah, I have a standard 59–60 Hz monitor, and don't understand how "VRR" can be related to me.

By the way, I see reports this update helped for someone: https://forums.developer.nvidia.com/t/flickering-at-the-top-of-the-screen/256447/48

I'll check it later for myself.

AlexGoinsNV commented 11 months ago

The release highlights on the webpage for 535.98 are the same as for 535.86.05, so I think something may have gone amiss there.

Nonetheless, 535.98 does contain another fix so please do check if it helps resolve your flickering issues.

FSKiller commented 11 months ago

For me 535.98 still flickers although a lot less frequently.

Flicker still happens with about the same frequency!

It's strange though, it doesn't happen while in games, either with proton or just Lutris.

Tue Aug 8 22:37:06 2023
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.98 Driver Version: 535.98 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+

DisplayPort 240hz ROG PG258Q with the G-Sync Module

AlexWayfer commented 11 months ago

I'd not hope a lot before v540 release. Let's wait.

cyberconan commented 11 months ago

G-Sync still doesn't work with the new version.

[mié ago  9 01:02:21 2023] ------------[ cut here ]------------
[mié ago  9 01:02:21 2023] WARNING: CPU: 11 PID: 530 at /var/lib/dkms/nvidia/535.98/build/nvidia-drm/nvidia-drm-crtc.h:264 nv_drm_handle_flip_occurred+0x105/0x200 [nvidia_drm]
[mié ago  9 01:02:21 2023] Modules linked in: snd_seq_dummy snd_seq snd_seq_device uinput ccm rfcomm hid_logitech_hidpp hid_logitech_dj cmac algif_hash algif_skcipher af_alg bnep hid_gt683r uvcvideo btusb videobuf2_vmalloc btrtl uvc btbcm videobuf2_memops videobuf2_v4l2 btintel xpad btmtk ff_memless videodev bluetooth videobuf2_common mc usbhid ecdh_generic zram tcp_bbr2 sch_cake vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_cadence snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_generic_allocation soundwire_bus snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp intel_rapl_msr snd_soc_acpi_intel_match intel_rapl_common snd_soc_acpi intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_soc_core coretemp iwlmvm snd_compress kvm_intel ac97_bus mac80211 snd_hda_codec_realtek snd_pcm_dmaengine snd_hda_codec_hdmi joydev libarc4 snd_hda_codec_generic kvm mousedev ledtrig_audio
[mié ago  9 01:02:21 2023]  irqbypass ucsi_ccg snd_hda_intel crct10dif_pclmul crc32_pclmul snd_intel_dspcfg typec_ucsi polyval_clmulni snd_intel_sdw_acpi polyval_generic gf128mul iwlwifi msi_wmi typec ghash_clmulni_intel snd_hda_codec sha512_ssse3 nvidia_drm(POE) hid_multitouch aesni_intel crypto_simd nvidia_uvm(POE) cryptd rapl intel_wmi_thunderbolt wmi_bmof nvidia_modeset(POE) mei_pxp ee1004 mei_hdcp mxm_wmi roles sparse_keymap snd_hda_core gpio_keys intel_cstate snd_hwdep spi_nor video pcspkr snd_pcm cfg80211 intel_uncore psmouse mtd intel_lpss_pci i2c_i801 snd_timer alx mei_me rfkill i2c_smbus intel_lpss mdio snd mei idma64 soundcore intel_pch_thermal i2c_nvidia_gpu i2c_hid_acpi i2c_hid acpi_tad wmi acpi_pad soc_button_array mac_hid nvidia(POE) ec_sys sg vhba crypto_user loop fuse dm_mod vfat fat ip_tables x_tables ext4 crc32c_generic sdhci_pci serio_raw crc16 cqhci atkbd sdhci mbcache libps2 jbd2 vivaldi_fmap mmc_core nvme spi_intel_pci crc32c_intel xhci_pci nvme_core spi_intel xhci_pci_renesas nvme_common i8042 serio
[mié ago  9 01:02:21 2023] CPU: 11 PID: 530 Comm: nvidia-modeset/ Tainted: P        W  OE      6.4.8-zen1-1-zen #1 f0433d8e26ee717ec1be9c8f6cc94430834d1e76
[mié ago  9 01:02:21 2023] Hardware name: Micro-Star International Co., Ltd. GT75 Titan 8SG/MS-17A6, BIOS E17A6IMS.10D 03/17/2020
[mié ago  9 01:02:21 2023] RIP: 0010:nv_drm_handle_flip_occurred+0x105/0x200 [nvidia_drm]
[mié ago  9 01:02:21 2023] Code: 81 c7 98 00 00 00 e8 ca 29 fe d0 48 8b 3c 24 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f e9 63 0c 00 00 48 89 ef e8 9b b8 e5 d1 <0f> 0b eb b1 4c 89 e7 e8 4f 30 5d d1 84 c0 74 10 49 8b 14 24 49 8b
[mié ago  9 01:02:21 2023] RSP: 0018:ffffaf7c02ad3df8 EFLAGS: 00010282
[mié ago  9 01:02:21 2023] RAX: ffff8cfde6f38008 RBX: ffff8cfde6f38000 RCX: ffff8cfdc5af72e0
[mié ago  9 01:02:21 2023] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8cfde6f38018
[mié ago  9 01:02:21 2023] RBP: ffff8cfde6f38018 R08: 0000000000000001 R09: 0000000000000000
[mié ago  9 01:02:21 2023] R10: ffff8cfed885b800 R11: 0000000000000000 R12: ffff8cfde6f38008
[mié ago  9 01:02:21 2023] R13: ffff8cfdc7ffa608 R14: ffff8cfdc40f8000 R15: ffff8cfdc5af7000
[mié ago  9 01:02:21 2023] FS:  0000000000000000(0000) GS:ffff8d055dcc0000(0000) knlGS:0000000000000000
[mié ago  9 01:02:21 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[mié ago  9 01:02:21 2023] CR2: 00000000bab66e34 CR3: 0000000017220005 CR4: 00000000003706e0
[mié ago  9 01:02:21 2023] Call Trace:
[mié ago  9 01:02:21 2023]  <TASK>
[mié ago  9 01:02:21 2023]  ? __warn+0x81/0x1b0
[mié ago  9 01:02:21 2023]  ? nv_drm_handle_flip_occurred+0x105/0x200 [nvidia_drm 7c8b9aafcb9f53d3c808092f68419f5336ad4407]
[mié ago  9 01:02:21 2023]  ? report_bug+0x202/0x270
[mié ago  9 01:02:21 2023]  ? handle_bug+0x3c/0x80
[mié ago  9 01:02:21 2023]  ? exc_invalid_op+0x19/0xc0
[mié ago  9 01:02:21 2023]  ? asm_exc_invalid_op+0x1a/0x20
[mié ago  9 01:02:21 2023]  ? nv_drm_handle_flip_occurred+0x105/0x200 [nvidia_drm 7c8b9aafcb9f53d3c808092f68419f5336ad4407]
[mié ago  9 01:02:21 2023]  ? nv_drm_handle_flip_occurred+0x105/0x200 [nvidia_drm 7c8b9aafcb9f53d3c808092f68419f5336ad4407]
[mié ago  9 01:02:21 2023]  nv_drm_event_callback+0x82/0x90 [nvidia_drm 7c8b9aafcb9f53d3c808092f68419f5336ad4407]
[mié ago  9 01:02:21 2023]  nvKmsKapiHandleEventQueueChange+0xa0/0xd0 [nvidia_modeset 52d5de684233d8c68afdc43a30321d6c79f83176]
[mié ago  9 01:02:21 2023]  _main_loop+0x90/0x150 [nvidia_modeset 52d5de684233d8c68afdc43a30321d6c79f83176]
[mié ago  9 01:02:21 2023]  ? __pfx__main_loop+0x10/0x10 [nvidia_modeset 52d5de684233d8c68afdc43a30321d6c79f83176]
[mié ago  9 01:02:21 2023]  kthread+0xe5/0x120
[mié ago  9 01:02:21 2023]  ? __pfx_kthread+0x10/0x10
[mié ago  9 01:02:21 2023]  ret_from_fork+0x29/0x50
[mié ago  9 01:02:21 2023]  </TASK>
[mié ago  9 01:02:21 2023] ---[ end trace 0000000000000000 ]---
[mié ago  9 01:02:21 2023] ------------[ cut here ]------------

Returning to 525.116.04.

amrit1711 commented 11 months ago

Even after upgrading to version 535.86.05-0ubuntu0.22.04.1 the behavior remains the same. I'm running 1 monitor (LG) at 3840x2160 at 60 Hz on DP Ubuntu Gnome X Server

Linux hal 5.19.0-50-generic #50-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 10 18:24:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID:   Ubuntu
Description:  Ubuntu 22.04.2 LTS
Release:  22.04
Codename: jammy
nvidia-compute-utils-535                         535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA compute utilities
nvidia-dkms-535                                  535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA DKMS package
nvidia-driver-535                                535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA driver metapackage
nvidia-firmware-535-535.86.05                    535.86.05-0ubuntu0.22.04.1                     amd64        Firmware files used by the kernel module
nvidia-kernel-common-535                         535.86.05-0ubuntu0.22.04.1                     amd64        Shared files used with the kernel module
nvidia-kernel-source-535                         535.86.05-0ubuntu0.22.04.1                     amd64        NVIDIA kernel source package

Could you please help to share exact display model along with flickering video for the reference.

amrit1711 commented 11 months ago

~For me 535.98 still flickers although a lot less frequently.~

Flicker still happens with about the same frequency!

It's strange though, it doesn't happen while in games, either with proton or just Lutris.

Tue Aug 8 22:37:06 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.98 Driver Version: 535.98 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+

DisplayPort 240hz ROG PG258Q with the G-Sync Module

Thanks for sharing the test results, I am trying to find ROG PG258Q internally and will check for local repro.

fulalas commented 11 months ago

The same flicker issue for me using the new 535.98. The only solution is reverting to 525.85.05 :(

voodoos commented 11 months ago

Same kind of issue here.

System: Arch Kernel: 6.4.8 Driver: nvidia-dkms 535.86.05-2 Card: GTX 1070 WM: KDE Plasma X11

I observe frequent screen flickering: frequent black flashes lasting about 1/4 of second.

nvidia-bug-report.log.gz

jarrard commented 11 months ago

If you run a desktop FPS counter you can kind of see why it has this flashing. The fps frequently drops quite low when doing things or playing videos etc which can trigger VRR problems.

AMD had these problems in the early days also with freesync. Too low fps and it just choke and flashes/flutters the screen brightness. Quite annoying bug.

taleteller commented 11 months ago

@jarred if it where just an VRR Problem, but a lot of people like me see this behavior with static 60hz displays. There is no GSync/FreeSync involved at all.

thesword53 commented 11 months ago

Same kind of issue here.

System: Arch Kernel: 6.4.8 Driver: nvidia-dkms 535.86.05-2 Card: GTX 1070 WM: KDE Plasma X11

I observe frequent screen flickering: frequent black flashes lasting about 1/4 of second.

* The higher the refresh rate (up to 144hz) the more frequent flashes appear

* They seem to be triggers by user interaction but also happen randomly

* Disabling KWin's compositor completely removes the flashing

* Keeping compositor on but disabling GSync and sticking with 60Hz greatly alleviate the issue, black flashes are almost gone a few minute after boot (but very present right after booting).

nvidia-bug-report.log.gz

I have the same issue. To work around this you can:

z1atk0 commented 11 months ago

535.98 does not fix the issue for me, still same as before with 535.86.05 (ie. not as bad as 535.54.03, but still the occasional flicker). No VRR/G-SYNC/FreeSync, plain old 60Hz fixed. Only reverting to 525.116.04 removes the flicker completely.

GPU: NVIDIA TU116 (GeForce GTX 1660 Ti) Driver: 535.98 Monitor: 2x AOC 24G2SPU @ HDMI (=> no VRR, G-SYNC only works on DP inputs) OS: Slackware64-15.0 Kernel: 6.1.44 DE: GNOME 44 on X11 (no Wayland) Screen Resolution: 3840x1080 @ 60Hz (= 2x 1920x1080 @ TwinView)

EDIT: oh, and I don't use KMS/modeset, if that matters:

[root@disclosure:~]# cat /sys/module/nvidia_drm/parameters/modeset
N
thesword53 commented 11 months ago

I updated the driver to 535.98 and now it crashes when I logout and login back to Plasma Wayland session (SDDM Wayland login manager) or when I disable and enable screen back.

[   30.789310] ------------[ cut here ]------------
[   30.789311] WARNING: CPU: 15 PID: 888 at drivers/dma-buf/dma-buf.c:1537 dma_buf_vmap+0xf0/0x100
[   30.789316] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg bnep 8021q garp mrp stp llc uvcvideo uvc gspca_vc032x gspca_main videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) btusb btrtl btbcm btintel btmtk bluetooth vfat mousedev ecdh_generic fat intel_rapl_msr intel_rapl_common edac_mce_amd iwlmvm kvm_amd ccp nvidia(POE) mac80211 kvm ntfs3 snd_hda_codec_realtek snd_hda_codec_generic ucsi_ccg irqbypass ledtrig_audio libarc4 snd_hda_codec_hdmi typec_ucsi joydev typec crct10dif_pclmul roles crc32_pclmul polyval_clmulni snd_hda_intel polyval_generic gf128mul iwlwifi snd_usb_audio ghash_clmulni_intel snd_intel_dspcfg sha512_ssse3 aesni_intel crypto_simd snd_usbmidi_lib snd_intel_sdw_acpi cryptd rapl snd_hda_codec snd_rawmidi cfg80211 igb snd_hda_core snd_seq_device wmi_bmof sp5100_tco mc acpi_cpufreq snd_hwdep video pcspkr i2c_algo_bit zenpower(OE) i2c_piix4 i2c_nvidia_gpu snd_pcm rfkill dca
[   30.789394]  snd_timer mac_hid snd usbhid soundcore dm_multipath crypto_user loop fuse dm_mod bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 nvme xhci_pci crc32c_intel nvme_core xhci_pci_renesas nvme_common wmi
[   30.789417] CPU: 15 PID: 888 Comm: kwin_wayland Tainted: P           OE      6.4.9-arch1-1 #1 3461e85deab2986acc9a45474db12841c06eb98b
[   30.789419] Hardware name: Micro-Star International Co., Ltd. MS-7B93/MPG X570 GAMING PRO CARBON WIFI (MS-7B93), BIOS 1.I0 03/01/2023
[   30.789421] RIP: 0010:dma_buf_vmap+0xf0/0x100
[   30.789424] Code: c0 01 89 43 28 48 85 c9 74 1c 48 8b 43 30 48 8b 53 38 49 89 04 24 49 89 54 24 08 eb c3 0f 0b b8 ea ff ff ff eb bc 0f 0b 0f 0b <0f> 0b eb b4 b8 ea ff ff ff eb ad e8 20 07 40 00 90 90 90 90 90 90
[   30.789425] RSP: 0018:ffffba42854d3b10 EFLAGS: 00010282
[   30.789427] RAX: fffffffffffffff4 RBX: ffff9a4b1ba39000 RCX: 0000000000000027
[   30.789429] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9a51cede16c0
[   30.789430] RBP: ffffba42854d3b38 R08: 0000000000000000 R09: ffffba42854d38f8
[   30.789432] R10: 0000000000000003 R11: ffff9a51ef3284a8 R12: ffff9a4aeff0cc98
[   30.789433] R13: ffff9a4b1efd8798 R14: ffff9a4aeff0cc98 R15: 0000000000000000
[   30.789434] FS:  00007fcc9980e640(0000) GS:ffff9a51cedc0000(0000) knlGS:0000000000000000
[   30.789436] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.789438] CR2: 000055bcc0e336e0 CR3: 000000012ed58000 CR4: 0000000000350ee0
[   30.789439] Call Trace:
[   30.789441]  <TASK>
[   30.789442]  ? dma_buf_vmap+0xf0/0x100
[   30.789445]  ? __warn+0x81/0x130
[   30.789449]  ? dma_buf_vmap+0xf0/0x100
[   30.789452]  ? report_bug+0x171/0x1a0
[   30.789455]  ? handle_bug+0x3c/0x80
[   30.789458]  ? exc_invalid_op+0x17/0x70
[   30.789461]  ? asm_exc_invalid_op+0x1a/0x20
[   30.789468]  ? dma_buf_vmap+0xf0/0x100
[   30.789472]  drm_gem_shmem_vmap_locked+0x27/0x1c0
[   30.789477]  drm_gem_shmem_object_vmap+0x31/0x50
[   30.789479]  ? dma_resv_get_singleton+0x46/0x140
[   30.789482]  drm_gem_vmap+0x22/0x50
[   30.789485]  drm_gem_vmap_unlocked+0x2a/0x50
[   30.789488]  drm_gem_fb_vmap+0x41/0x120
[   30.789492]  drm_atomic_helper_prepare_planes+0x17a/0x210
[   30.789496]  drm_atomic_helper_commit+0x78/0x140
[   30.789500]  drm_atomic_commit+0x9a/0xd0
[   30.789504]  ? __pfx___drm_printfn_info+0x10/0x10
[   30.789508]  drm_mode_atomic_ioctl+0x9b5/0xbc0
[   30.789515]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[   30.789518]  drm_ioctl_kernel+0xcd/0x170
[   30.789522]  drm_ioctl+0x26d/0x4b0
[   30.789525]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[   30.789531]  __x64_sys_ioctl+0x94/0xd0
[   30.789535]  do_syscall_64+0x60/0x90
[   30.789538]  ? __x86_return_thunk+0x9/0x10
[   30.789540]  ? do_syscall_64+0x6c/0x90
[   30.789542]  ? __x86_return_thunk+0x9/0x10
[   30.789544]  ? syscall_exit_to_user_mode+0x1b/0x40
[   30.789547]  ? __x86_return_thunk+0x9/0x10
[   30.789549]  ? do_syscall_64+0x6c/0x90
[   30.789551]  entry_SYSCALL_64_after_hwframe+0x77/0xe1
[   30.789554] RIP: 0033:0x7fcc9e10ce1f
[   30.789573] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[   30.789575] RSP: 002b:00007ffde93c4170 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   30.789578] RAX: ffffffffffffffda RBX: 000055bcc0baf3e0 RCX: 00007fcc9e10ce1f
[   30.789579] RDX: 00007ffde93c4210 RSI: 00000000c03864bc RDI: 0000000000000016
[   30.789580] RBP: 00007ffde93c4210 R08: 0000000000000002 R09: 0000000000000002
[   30.789582] R10: 000055bcbe0a9010 R11: 0000000000000246 R12: 00000000c03864bc
[   30.789583] R13: 0000000000000016 R14: 000055bcbe77c060 R15: 000055bcbe74f870
[   30.789588]  </TASK>
[   30.789589] ---[ end trace 0000000000000000 ]---

Good news. Screen blackout seems to be fixed but I still have VRR stutters and it's not working properly on Wayland sessions: https://forums.developer.nvidia.com/t/monitors-literally-stutter-when-vrr-g-sync-is-enabled/256836

sentakuhm commented 11 months ago

same issue with NVIDIA GeForce RTX 2060: Distro: Arch Linux kernel: 6.4.9-zen1-1-zen Driver ver: 535.98 CUDA: 12.2 Window System: X11 nvidia-bug-report.log.gz

WannaBeOCer commented 11 months ago

Is this issue only occurring on the desktop for the users experiencing it? VRR shouldn’t be enabled on the desktop.

AlexGoinsNV commented 11 months ago

@taleteller , @z1atk0 ,

If you are seeing flickering with 2x 1920x1080@60Hz displays without VRR, you may be encountering a different issue than what has previously been addressed. The change that triggered flickering on VRR displays should be present in 525.116.04. Just to be 100% sure, can you confirm if you see the issue on 535.98 if you load nvidia-modeset with disable_vrr_mclk_switch=1 disable_vrr_memclk_switch=1 ?

@WannaBeOCer

The feature disabled by the workaround option disable_vrr_mclk_switch=1 disable_vrr_memclk_switch=1 leverages VRR to extend vblank to allow changing memory clocks when it otherwise wouldn't be possible. This isn't the typical VRR use-case where display refresh matches frames rendered by a 3D application. It could occur on the desktop whenever memory clocks change.

@thesword53

Are you seeing visible stutter, or are you relying on the display's refresh rate indicator? As mentioned in my reply to @WannaBeOCer, this feature involves extending vblank to allow memory clock switching, which would be interpreted by the display as a brief dip in refresh rate. That behavior is expected, but if it is visually disruptive when viewing content that's something we would want to be aware of.

notfood commented 11 months ago

I updated to 535.98. I still get flicker on the top row of my main monitor but it's become rare, as in 1 or 2 every 10 minutes compared to the nonstop flickering from the previous driver it's an improvement but this is still undesirable. Does not happen in games, most likely to happen when it's been idlying or casual internet browsing.

From last time:

Chiming in to share my experience. I'm experiencing partial flickering on my main monitor in a dual monitor setup using 535.54.03 at 60Hz.

Environment: Main monitor: 1920x1080@60Hz Second monitor: 1024x768@60Hz GPU1: GeForce GTX 1650 GPU2: Tesla M40 (not in use) OS: ArchLinux

It's occasional, I suspect GPU is switching power states, the flickering only happens on the top of the screen. It doesn't matter what I'm running, I can be on X11 or Wayland, it happens on KDE Plasma and I3w, it happens with no applications running, it happens during the beginning of heavy GPU usage. Downgrading it to 530.41.03 shows none of these issues.

akb825 commented 11 months ago

I've also been experiencing the same issue the last two or three months with the proprietary drivers, including with the latest. (535.86.05) Some key highlights based on the discussion so far:

* I'm running 2 monitors at 3840x2160 at 60 Hz on DP. I am NOT using VRR and G-SYNC is not enabled on either monitor.

* I am using KDE Plasma through X11, and mainly see the issue with flickering at the top of my primary monitor. I also tried with Wayland, but had periods where my secondary screen completely blanked out similar to @thesword53.

* Best I can tell the 535.86.05 update had minimal if any impact.

More detailed system information:

* GPU: GeForce RTX 2080

* Driver: 535.86.05

* Distro: Arch Linux

* Kernel: 6.4.4

* DE: KDE Plasma 5.27.6 X11

* Monitors, both at 3840x2160 60 Hz and connected to DP:

  * Primary: DELL U2720Q
  * Secondary: BenQ PD2700U

Still seeing this with the 535.98 driver.

can you confirm if you see the issue on 535.98 if you load nvidia-modeset with disable_vrr_mclk_switch=1 ?

@AlexGoinsNV I tried adding the following to /etc/modprobe.d/nvidia.conf:

options nvidia-modeset disable_vrr_mclk_switch=1

but I see the following error in dmesg:

nvidia_modeset: unknown parameter 'disable_vrr_mclk_switch' ignored

Is this the correct way to set the option? Is this option exclusive to the open source driver? (I'm currently using the proprietary driver)

AlexGoinsNV commented 11 months ago

@akb825 Sorry, it's disable_vrr_memclk_switch=1, not disable_vrr_mclk_switch=1, I mistyped it in my earlier post (edited it to avoid further confusion). It should work equally on either the proprietary driver or the open source driver.

akb825 commented 11 months ago

@AlexGoinsNV thanks, I applied the corrected option and verified I'm no longer getting an unknown parameter error. The flickering remains, which seems to corroborate your idea that it's separate from the VRR issues addressed earlier, especially since my monitors are at a static 60 Hz.

jarrard commented 11 months ago

There may be 3 different issues with the driver(s) 1) the partial screen black flicker issue (even when vrr is disabled) 2) whole screen blinking 3) the issue I get under Linux and Windows which is gamma ramping up and down causing brightness/gamma fluttering?, doesn't happen with vrr disabled last time I tested.

z1atk0 commented 11 months ago

@taleteller , @z1atk0 ,

If you are seeing flickering with 2x 1920x1080@60Hz displays without VRR, you may be encountering a different issue than what has previously been addressed. The change that triggered flickering on VRR displays should be present in 525.116.04. Just to be 100% sure, can you confirm if you see the issue on 535.98 if you load nvidia-modeset with ~disable_vrr_mclk_switch=1~ disable_vrr_memclk_switch=1 ?

@AlexGoinsNV done that, but the flicker is still there, just as you suspected:

[root@disclosure:~]# grep . /sys/module/nvidia_modeset/parameters/*
/sys/module/nvidia_modeset/parameters/config_file:(null)
/sys/module/nvidia_modeset/parameters/disable_vrr_memclk_switch:Y
/sys/module/nvidia_modeset/parameters/fail_malloc:-1
/sys/module/nvidia_modeset/parameters/malloc_verbose:N
/sys/module/nvidia_modeset/parameters/output_rounding_fix:Y
robvdl commented 11 months ago

I'm also still running into this with a single monitor and RTX 2080, I'm going to try some of the suggestions here.

spboehm commented 11 months ago

I also have this issue:

Graphics:
  Device-1: NVIDIA TU106 [GeForce RTX 2060 SUPER] driver: nvidia v: 535.86.05
  Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.1.2 driver: X:
    loaded: nvidia unloaded: fbdev,modesetting,vesa gpu: nvidia,nvidia-nvswitch
    resolution: 3440x1440
  API: OpenGL v: 4.6.0 NVIDIA 535.86.05 renderer: NVIDIA GeForce RTX 2060
    SUPER/PCIe/SSE2

I'm running openSUSE Tumbleweed (20230815) with the software mentioned above. One single screen attached via HDMI, 100 Hz.

Surprisingly, the issue disappeared entirely after putting the system in sleep state and awake it again. Maybe something related to power state? Please reach out if you need further information.

Can others confirm this behavior?

akb825 commented 11 months ago

Can others confirm this behavior?

I tried suspending and resuming and so far it seems to avoid the flicker. It does appear to be adjusting the power state based on activity both before and after the suspend, but for whatever reason it seems to avoid flickering when it transitions power state after it's resumed. I'll report back if I notice any flickering later after a suspend/resume.

r-ca commented 11 months ago

Screen flicker completely fixed when using "Prefer Maximum Performance" mode instead of "Adaptive" mode. This method worked with the latest 535.98-2 driver(If I remember correctly, it also works with nvidia-open driver). Note: Idle power consumption increased by 20W (50W -> 71W) in my environment. image

Environment:

yay -Qs nvidia local/egl-wayland 2:1.1.12-1
local/lib32-libvdpau 1.5-1
local/lib32-nvidia-utils 535.98-1
local/lib32-opencl-nvidia 535.98-1
local/libvdpau 1.5-1
local/libxnvctrl 535.98-1
local/nvidia 535.98-2
local/nvidia-settings 535.98-1
local/nvidia-utils 535.98-1
local/opencl-nvidia 535.98-1

Display Device Infomation was exactly the same on all monitors image (I use translation and do not have much knowledge about Linux.)

xaqbr commented 11 months ago

I also have this issue:

Graphics:
  Device-1: NVIDIA TU106 [GeForce RTX 2060 SUPER] driver: nvidia v: 535.86.05
  Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.1.2 driver: X:
    loaded: nvidia unloaded: fbdev,modesetting,vesa gpu: nvidia,nvidia-nvswitch
    resolution: 3440x1440
  API: OpenGL v: 4.6.0 NVIDIA 535.86.05 renderer: NVIDIA GeForce RTX 2060
    SUPER/PCIe/SSE2

I'm running openSUSE Tumbleweed (20230815) with the software mentioned above. One single screen attached via HDMI, 100 Hz.

Surprisingly, the issue disappeared entirely after putting the system in sleep state and awake it again. Maybe something related to power state? Please reach out if you need further information.

Can others confirm this behavior?

The sleep fix works good for me so far, on Wayland too even! Two monitors, one 144hz and the other 60hz.

Operating System: Nobara Linux 38
KDE Plasma Version: 5.27.6
KDE Frameworks Version: 5.108.0
Qt Version: 5.15.10
Kernel Version: 6.4.8-202.fsync.fc38.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 12 × AMD Ryzen 5 3600X 6-Core Processor
Memory: 15.5 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 2060/PCIe/SSE2
nbarrientos commented 11 months ago

Screen flicker completely fixed when using prefer "Maximum Performance Mode" instead of "Adaptive Mode"

This seems to help indeed, at least in my case.

nvidia-settings -a "[gpu:0]/GpuPowerMizerMode=1"
nvidia 535.98-2
linux 6.4.10.arch1-1
exwm 0.27
xorg-server 21.1.8-2
z1atk0 commented 11 months ago

Just for reference, setting the preferred mode to "Maximum Performance Mode" instead of "Adaptive Mode" did not eliminate the flicker on my system.

kurld commented 11 months ago

Absolutely no change after updating to 535.98. Maximum performance power mode makes the ficker less frequent but it's still there. Single, 60Hz screen.

spboehm commented 11 months ago

Just for reference, setting the preferred mode to "Maximum Performance Mode" instead of "Adaptive Mode" did not eliminate the flicker on my system.

Absolutely no change after updating to 535.98. Maximum performance power mode makes the ficker less frequent but it's still there. Single, 60Hz screen.

Can completely confirm this behavior. Flickering is less frequent, but it is still there.