NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.26k stars 1.29k forks source link

525+ driver external display render problems #419

Closed VPaulV closed 1 year ago

VPaulV commented 1 year ago

NVIDIA Open GPU Kernel Modules Version

525.60.11

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Gentoo Linux

Kernel Release

Linux home 5.15.80-gentoo-x86_64

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU

Describe the bug

Hi Guys,

After updating to the 525.60.11 (both opensource and proprietary) I have a problem with the external screen. When I connect (with hdmi) everything on the screen is extremely laggy, like rendering happens on the intel GPU instead of the NVIDIA card. By laggy I mean every change on the screen takes a few seconds to redraw. This happens only on the Intel+NVIDIA GPU laptops.

Please let me know if I can provide any additional logs that will help.

To Reproduce

  1. Update to 525.60.11
  2. Connect external screen with HDMI
  3. Enjoy the lags

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz (515.86.01 version that works as should) nvidia-bug-report.log.gz (525.60.11 with extremely laggy external display)

More Info

No response

ViBE-HU commented 1 year ago

same here. really confusing how things goes... i continuously testing which makes me tired.

https://askubuntu.com/questions/1438162/ubuntu-22-04-does-not-detects-external-display-after-nvidia-driver-installed

VPaulV commented 1 year ago

UPD: 525.60.13 same problem

sriharshachilakapati commented 1 year ago

Is this only happening on Xorg or is it happening on Wayland too? I have a laptop GPU NVIDIA 3060 and I'm wondering whether I should update or not.

VPaulV commented 1 year ago

In my case it is xorg-server 21.1.4, would be nice if you could update and check if you have the same problem

ShuraEx commented 1 year ago

I'm having similar issues with driver 525 as well.

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Arch latest

Kernel Release

Linux 6.0.12-arch1-1

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 2060 Laptop GPU

Describe the bug

Problem happens when using an external display using hdmi, similar to the others who posted here, everything becomes super laggy and takes 3-4sec to render. For me this only happens when using only the external display with the laptop screen disabled, if i use both displays then it seems to work fine. But i dock my laptop when at home or during presentations so would prefer to be able to keep screen closed and only using external display or projector.

To Reproduce

Update to Nvidia Proprietary or Open 525
Connect external screen with HDMI

Bug Incidence

Always

VPaulV commented 1 year ago

Oh that is a good point, I totally forgot to write, for me the issue is only on laptops with intel+nvidia GPUs

sriharshachilakapati commented 1 year ago

I just updated, and I only faced a minor issue so far, which is my external monitor flickered a lot for almost a minute after boot. After that, I was even able to play my games (verified with Spider-Man Remastered and Witcher 3).

Operating System and Version

Manjaro (latest testing branch)

Kernel Release

Linux 6.1.0-1-MANJARO

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 3060 Mobile

Driver version 525.60.11 (Installed through Manjaro repo, not sure if it is proprietary or open source one).

Using Wayland and connected monitor using HDMI.

SAMSUNG LU28R55 Resolution 3840x2160 Scale 115%

pynkpanther commented 1 year ago

I have horrible flickering and ghosting / image retention (image burns in for 10+ minutes). Switching back to 470.161.03 (through mhdw) the problem is no longer persistent

Operating System and Version

Manjaro XFCE edition

Kernel Release

Linux 6.1 but also tested rollback to Linux 5.15

Hardware: GPU

GPU 0: NVIDIA GeForce GTX 1080 TI

Driver version

Using Xorg and Displayport on QHD KOORUI display (no clue what panel is inside this brand)

VPaulV commented 1 year ago

UPD: New driver - 525.78.01, same issue

VPaulV commented 1 year ago

I had time to check, it is broken from 525 version, 520 and below works just fine. Unfortunately, my expertise in GPU drivers is very limited, but if one could suggest in what part of driver this may happen I could try to partially revert patches and see if it helps

psi4j commented 1 year ago

Is this only happening on Xorg or is it happening on Wayland too? I have a laptop GPU NVIDIA 3060 and I'm wondering whether I should update or not.

It's happening on Wayland for me. Somebody linked me this posthttps://www.linuxquestions.org/questions/showthread.php?p=6404174#post6404174, where there are some people indicating they've been able to get 525.78.01 working with kernel 6.1.5. Not sure if they're using Wayland, however.

Alkaid-Benetnash commented 1 year ago

I have similar issues. Archlinux, laptop, intel+nvidia GPU, using nvidia modeset kernel 6.1.1 + nvidia 525.60.11 works kernel 6.1.6 + nvidia 525.78.01 external display freeze/blackscreen

The only reproducible misbehavior I manage to check the log complains

[nvidia-modeset idling display engine timed out](nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57e:0:0:1128)

in the dmesg.

effeffe commented 1 year ago

Same here, with errors relating to display idling: [ 26.740169] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128

I'm on Xorg, and it occours with both open and closed drivers

AndreMaz commented 1 year ago

Same here

[ 22.898786] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128

Info:

MichaelXt commented 1 year ago

After update to 525 version, my external monitor gets to 1 frame per 10 seconds in case if only external monitor is enabled.

Driver: 525.85.12
Kernel: 5.15.0-58-generic
xserver-xorg-core: 2:21.1.3-2ubuntu2.5
Ubuntu 22.04.1 LTS
Feb 01 22:45:41 msi-u /usr/libexec/gdm-x-session[2336]: (II) NVIDIA(G0): Setting mode "HDMI-1-0: nvidia-auto-select @3840x2160 +0+0 {AllowGSYNC=Off, ViewPortIn=3840x2160, ViewPortOut=3840x2160+0+0}"
Feb 01 22:46:17 msi-u kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128
Feb 01 22:46:19 msi-u kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128
Feb 01 22:46:47 msi-u /usr/libexec/gdm-x-session[2336]: (II) NVIDIA(G0): Setting mode "HDMI-1-0: nvidia-auto-select @3840x2160 +0+0 {AllowGSYNC=Off, ViewPortIn=3840x2160, ViewPortOut=3840x2160+0+0}"

Are there any workaround besides downgrading the nvidia drivers and corresponding cuda version?

Alkaid-Benetnash commented 1 year ago

I have similar issues. Archlinux, laptop, intel+nvidia GPU, using nvidia modeset kernel 6.1.1 + nvidia 525.60.11 works kernel 6.1.6 + nvidia 525.78.01 external display freeze/blackscreen

The only reproducible misbehavior I manage to check the log complains

[nvidia-modeset idling display engine timed out](nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c57e:0:0:1128)

in the dmesg.

Following up. Updated to Archlinux kernel 6.1.9 and nvidia 525.85.05. Still using modeset. External Monitor works fine for now. No more "timed out" errors in the dmesg neither.

AndreMaz commented 1 year ago

Got a new update today.

One external screen continues black due to

[ 22.898786] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128

the other one was working for 15 minutes and then died with the following info

[  700.457227] pcieport 0000:00:01.1: pciehp: Slot(0): Link Down
[  700.457231] pcieport 0000:00:01.1: pciehp: Slot(0): Card not present
[  700.457240] snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)
[  700.586286] NVRM: GPU at PCI:0000:01:00: GPU-d285da56-46b0-17be-5844-3892e0d2d716
[  700.586294] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[  700.586298] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[  700.586310] NVRM: A GPU crash dump has been created. If possible, please run
               NVRM: nvidia-bug-report.sh as root to collect this data before
               NVRM: the NVIDIA kernel module is unloaded.
[  700.854261] snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)
[  701.250270] snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register 0x6f0100. -5
[  701.250281] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[  701.250286] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[  701.250290] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[  701.250293] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD buf size -1
[  701.450548] pci 0000:01:00.1: Removing from iommu group 8
[  701.450566] NVRM: Attempting to remove device 0000:01:00.0 with non-zero usage count!

After the crash running nvidia-smi produces the following message

Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

Here's the bug report nvidia-bug-report.log.gz

VPaulV commented 1 year ago

I have observed that some people have been successful with the 6.* Linux kernel. As a result, I built 6.1.11 and tried every NVIDIA driver above version 515.86.01. Unfortunately, all of these drivers have the same issue with external displays, including the latest version 525.89.02.

anasbouzid commented 1 year ago

I have observed that some people have been successful with the 6.* Linux kernel. As a result, I built 6.1.11 and tried every NVIDIA driver above version 515.86.01. Unfortunately, all of these drivers have the same issue with external displays, including the latest version 525.89.02.

I have the same problem as yours (but with an AMD+NVIDIA GPU laptop), tested every single combination, I'm currently using 520.56.06, if I upgrade the bug reappears again. I'm not sure if it's an issue with Nvidia or xorg, a fix was already provided on xorg for this issue https://gitlab.freedesktop.org/xorg/xserver/-/issues/1028 but it no longer works on latest versions of Nvidia drivers.

barluk87 commented 1 year ago

I have same issue, and can reproduce in different way. My setup is Ubuntu server with RTX 1660 Super (proprietary NVIDIA 525 driver ) with KVM, while I have power up KVM connected to HDMI port everything is working correctly. Issue is happening when I disconnecting power from KVM what is recognized by GPU as monitor disconnection, in syslog start to see log:

Feb 19 17:01:29 kernel: [27595.505195] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device HDMI-0

Frame rate drops to few fps and it is visible in nvenc stat x264 stream.

ShuraEx commented 1 year ago

I'm having similar issues with driver 525 as well.

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Arch latest

Kernel Release

Linux 6.0.12-arch1-1

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 2060 Laptop GPU

Describe the bug

Problem happens when using an external display using hdmi, similar to the others who posted here, everything becomes super laggy and takes 3-4sec to render. For me this only happens when using only the external display with the laptop screen disabled, if i use both displays then it seems to work fine. But i dock my laptop when at home or during presentations so would prefer to be able to keep screen closed and only using external display or projector.

To Reproduce

Update to Nvidia Proprietary or Open 525
Connect external screen with HDMI

Bug Incidence

Always

Just an update to my issue, till now I was using Nvidia 520.xx dkms with the latest stock kernel so I could use presentation mode but with the release of kernel 6.2, Nvidia 520.xx dkms modules stopped building into the kernel. With 520 no longer building in 6.2 and 525 still lagging when in presentation mode, I decided to try Nvidia-beta from the AUR. nvidia beta is currently using version 530.xx and so far it seems to have resolved my lag issue in multi-monitor presentation mode.

I'm using regular Nvidia beta but I do believe there is an Nvidia-open-beta in the AUR. Everyone's issue in this thread was similar but not exactly the same so I can't promise it will fix everyone's problems but if your on Arch it's worth a shot. For other distros, you may be able to get Nvidia beta source from their site. Anyway just wanted to update everyone here since till now there was no real solution to our problem. Hope it helps.

gilvbp commented 1 year ago

Same here, I have 3090 with 4 monitors on Arch Linux. Lagging a lot since the 525 driver update.

IvanGreen commented 1 year ago

I'm having similar issues with driver 525 as well. Does this happen with the proprietary driver (of the same version) as well? Yes Operating System and Version Arch latest Kernel Release Linux 6.0.12-arch1-1 Hardware: GPU GPU 0: NVIDIA GeForce RTX 2060 Laptop GPU Describe the bug Problem happens when using an external display using hdmi, similar to the others who posted here, everything becomes super laggy and takes 3-4sec to render. For me this only happens when using only the external display with the laptop screen disabled, if i use both displays then it seems to work fine. But i dock my laptop when at home or during presentations so would prefer to be able to keep screen closed and only using external display or projector. To Reproduce

Update to Nvidia Proprietary or Open 525
Connect external screen with HDMI

Bug Incidence Always

Just an update to my issue, till now I was using Nvidia 520.xx dkms with the latest stock kernel so I could use presentation mode but with the release of kernel 6.2, Nvidia 520.xx dkms modules stopped building into the kernel. With 520 no longer building in 6.2 and 525 still lagging when in presentation mode, I decided to try Nvidia-beta from the AUR. nvidia beta is currently using version 530.xx and so far it seems to have resolved my lag issue in multi-monitor presentation mode.

I'm using regular Nvidia beta but I do believe there is an Nvidia-open-beta in the AUR. Everyone's issue in this thread was similar but not exactly the same so I can't promise it will fix everyone's problems but if your on Arch it's worth a shot. For other distros, you may be able to get Nvidia beta source from their site. Anyway just wanted to update everyone here since till now there was no real solution to our problem. Hope it helps.

Ubuntu 22.10 Nvidia RTX3060 Laptop AMD Ryzen 5 5600H with AMD graphics 530.30.02 driver (beta)

Steel shit. Now I hate NVIDIA. On my second exactly the same laptop, but with graphics from AMD - everything works.

anasbouzid commented 1 year ago

@IvanGreen It's really unfortunate, they just don't care, it's been at least 2 years since this bug was mentioned (on other forums). It was somehow fixed with version 520.56.06 but next versions reintroduced the bug. The two options for now are to keep using the old version or to disable the AMD GPU and relies on Nvidia only.

VPaulV commented 1 year ago

Ubuntu 22.10 Nvidia RTX3060 Laptop AMD Ryzen 5 5600H with AMD graphics 530.30.02 driver (beta)

Steel shit. Now I hate NVIDIA. On my second exactly the same laptop, but with graphics from AMD - everything works.

Did you try to update kernel as well? (I didn't test myself, just curious about your results). Also, nowadays NVIDIA has way better Linux support than they had 5 years ago, so no reason to hate them. For now you can just use the 515/520 drivers, they are fine

IvanGreen commented 1 year ago

Ubuntu 22.10 Nvidia RTX3060 Laptop AMD Ryzen 5 5600H with AMD graphics 530.30.02 driver (beta) Steel shit. Now I hate NVIDIA. On my second exactly the same laptop, but with graphics from AMD - everything works.

Did you try to update kernel as well? (I didn't test myself, just curious about your results). Also, nowadays NVIDIA has way better Linux support than they had 5 years ago, so no reason to hate them. For now you can just use the 515/520 drivers, they are fine

Yes, I tried to update the kernel - unfortunately it did not help. I have two additional monitors working (2.5k, 21:9 each, one hdmi, one type-c) only with version 420 drivers. And if I turn off programmatically the main display of the laptop, friezes begin. I haven't tried the version with 520.56.06 drivers yet, but I've tried it with the proprietary 515/520 drivers (I don't know the sub-version) - it still doesn't work.

VPaulV commented 1 year ago

Hey guys, great news. Just have tried last Linux LTS kernel with 530.41.03 driver and everything works as should. Please try and confirm. My setup:

  1. Kernel - 6.1.19-gentoo-x86_64
  2. NVIDIA Driver Version: 530.41.03
  3. Xorg-server 21.1.7
  4. Razer laptop with GeForce RTX 3080 Mobile

Please note if you are using IBT you would want 6.2+ kernel.

khoek commented 1 year ago

This problem is brutal because I was running on 515 to avoid it on Ubuntu 22.10, but at least in the current beta the 515 totally hang the systemd-udevd kernel task in the 6.2 kernel which ships with ubuntu 23.04 beta.

I can confirm that upgrade to v530 fixed the problem for me, though.

VPaulV commented 1 year ago

Can we probably close the issue since it seems to be resolved with v530+ and 6.1+ kernel?