NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.26k stars 1.29k forks source link

Animations after idling are noticeably choppy until GPU ramps up with GSP firmware enabled #693

Open Gert-dev opened 3 months ago

Gert-dev commented 3 months ago

NVIDIA Open GPU Kernel Modules Version

555.58.02

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Arch Linux

Kernel Release

6.10.5

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

NVIDIA GeForce RTX 4090 Laptop GPU

Describe the bug

In mutter 46.4 on Wayland on a single-GPU NVIDIA system, there is always a 'jank' when idling for a few seconds and then switching virtual desktops. If you then start frantically switching desktops (i.e. triggering animations), after a second or two it becomes buttery smooth (feels like 144 FPS on a 144 Hz monitor), which is likely due to the GPU ramping up.

This doesn't happen on a high-refresh-rate display (240 Hz) being driven by an AMD iGPU in my case. On an Intel GPU it also happens but is fixed by applying triple buffering.

On the single-GPU NVIDIA system using triple buffering doesn't seem to resolve the issue despite the problems being similar to what it attempts to fix for Intel.

This was originally reported to mutter, but it turns out this is related to the GSP firmware used by the open kernel modules since the closed driver without GSP enabled doesn't experience this issue.

To Reproduce

  1. Start GNOME on Wayland on a high-refresh rate monitor (e.g. 120 Hz or higher).
  2. Wait about 5 seconds.
  3. Switch to a virtual desktop right.
  4. Notice the animation being choppy.
  5. Switch left and right about 10 times in a couple of seconds.
  6. Notice how the animation becomes smooth.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

ptr1337 commented 3 months ago

You maybe want to retest it with the 560 Driver, since there has been a bunch of fixes for this issue.

Also, check if there is any program in the background, which is calling "nvidia-smi". Some examples here:

Gert-dev commented 3 months ago

Thanks for the response. I tested the 560 beta driver and it's noticeably better, so already nice to hear that these improvements are coming soon!

I also noticed now that the problem seems to be exacerbated when GNOME's power saving mode is active - I have it active often because I have a Lenovo Legion and power saving mode equates to 'quiet mode' where the fans don't make as much noise - the GPU power limit is still set to 80 Watt in both cases, though.

I've been observing the output of nvidia-smi when it's active or inactive ('normal' mode) on the 560 driver:

I understand power saving mode may imply making sacrifices to save power, but since it's a 4090 I kind of still expected to have enough 'juice' to cover at least a smooth desktop. It's also a bit strange that the GPU is not allowed to go below 7 Watt in normal mode, since it's apparently completely possible in power saving mode, so it feels as normal mode tapes over the problem a bit in the sense that it's just keeping the GPU at higher power levels permanently to circumvent the ramp-up issue.

(FWIW, I also have no other applications running in the background during these tests. I use nvidia-smi to see stats sometimes, but the jank resulting from calling it is much more noticeable than and different from the 'fast stutter' of the ramp-up.)

mtijanic commented 3 months ago

Hey there, thanks for the report!

This seems to be a mix of problems, but a big part of it is likely the known issue of power transitions being a bit slower, per the driver readme:

Known Issues The following are some known limitations of the open kernel modules versus the proprietary kernel modules with GSP firmware mode disabled:

  • GPU initialization is slower. One possible mitigation is to use nvidia-persistenced to initialize the GPU(s) in advance, before running applications that use the GPU.
  • Enter and exit latencies for power-saving modes like S3, S4 and Run Time D3 (RTD3) can be longer due to additional GSP state being restored.
  • GPU power consumption can be marginally impacted in some scenarios.
  • Run Time D3 (RTD3) is only supported on Ampere and above GPUs.

The bad news is that there is no silver bullet here, you either need to keep things awake (consume more power) or live with the latency. But the good news (I guess?) is that this is still being actively worked on and you will probably see incremental improvements from every major release.

That said, going by the description alone, it does sound like you hit a particularly egregious case here, so we'll try to repro it and see if there's anything that makes it worse than it needs to be. My guess is that the 144Hz monitor is throwing a wrench into it.

bn45hkurr0y4 commented 1 month ago

I have the same problem on 565.57.01 on CachyOS At first I thought it was a problem with my ArchLinux configuration and that the new 565 drivers would fix them, but I was wrong

When using OPEN drivers with GSP I have constantly glitchy animations when moving windows, especially when moving windows between the edges of 2 monitors But also when using proprietary drivers with disabled GSP bugs are less, but still there are problems. If you wait 5 seconds and then start scrolling the page in Firefox - some second it will be with freezes, and later normally. I noticed that in btop at idle P-state = P8, but when scrolling = P5. When using the kernel parameter nvidia.NVreg_RegistryDwords=“RMForcePstate=5” scrolling became smooth and the problems disappeared I decided to put open-source drivers and sit with this parameter - still hangs when moving windows and scrolling pages.

Briefly

Nvidia-proriertary only nvidia.NVreg_EnableGpuFirmware=0 - there is a slowdown due to incorrect p-state operation only nvidia.NVreg_RegistryDwords=“RMForcePstate=5” - everything works fine with GSP and pstate=5

nvidia-open only nvidia.NVreg_RegistryDwords=“RMForcePstate=5” - still worse as GSP without p-state setting on proriertar drivers (when using nvidia-persistence I didn't see any difference).

I think the problem is exactly in p-state and its implementation in open drivers Stable work so far only in proprietary drivers or in Windows. If you have any recommendations on how to make things better or test something - write about it I hope it can help someone

UPD: But it's still a bad solution because it forces the video card to work at high power, especially when it's idle

Info

KDE 6.2.2 CachyOS(565) + Archlinux(560)(I haven't tested how it will be with p-state but I think the same as cachyos) Nvidia RTX 3080Ti 2x 2560x1440 165hz I have no programs that request data through nvidia-smi ``` +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3080 Ti Off | 00000000:01:00.0 On | N/A | | 0% 53C P5 55W / 350W | 981MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 985 G /usr/bin/kwin_wayland 173MiB | | 0 N/A N/A 1075 G /usr/bin/Xwayland 4MiB | | 0 N/A N/A 1109 G /usr/bin/ksmserver 3MiB | | 0 N/A N/A 1111 G /usr/bin/kded6 3MiB | | 0 N/A N/A 1136 G /usr/bin/plasmashell 296MiB | | 0 N/A N/A 1165 G /usr/lib/kactivitymanagerd 3MiB | | 0 N/A N/A 1167 G /usr/bin/gmenudbusmenuproxy 3MiB | | 0 N/A N/A 1168 G /usr/bin/kaccess 3MiB | | 0 N/A N/A 1171 G ...b/polkit-kde-authentication-agent-1 3MiB | | 0 N/A N/A 1172 G /usr/lib/org_kde_powerdevil 3MiB | | 0 N/A N/A 1173 G /usr/lib/xdg-desktop-portal-kde 3MiB | | 0 N/A N/A 1174 G /usr/bin/xembedsniproxy 3MiB | | 0 N/A N/A 1281 G /usr/bin/kdeconnectd 3MiB | | 0 N/A N/A 1343 G /usr/bin/xwaylandvideobridge 4MiB | | 0 N/A N/A 1381 G /usr/bin/dolphin 3MiB | | 0 N/A N/A 1445 G /usr/bin/kate 3MiB | | 0 N/A N/A 1469 G /usr/lib/firefox/firefox 268MiB | | 0 N/A N/A 2024 G /usr/bin/konsole 3MiB | | 0 N/A N/A 2335 G /usr/bin/konsole 3MiB | +-----------------------------------------------------------------------------------------+ .-------------------------: computer@x670e-a .+=========================. ---------------- :++===++==================- :++- OS: CachyOS Linux x86_64 :*++====+++++=============- .==: Kernel: Linux 6.11.5-3-cachyos -*+++=====+***++==========: Uptime: 7 mins =*++++========------------: Packages: 1038 (pacman) =*+++++=====- ... Shell: fish 3.7.1 .+*+++++=-===: .=+++=: Display (27A6MR): 2560x1440 @ 165 Hz in 27" [External] :++++=====-==: -*****+ Display (27A6MR): 2560x1440 @ 165 Hz in 27" [External] * :++========-=. .=+**+. DE: KDE Plasma 6.2.2 .+==========-. . WM: KWin (Wayland) :+++++++====- .--==-. WM Theme: Breeze :++==========. :+++++++: Theme: Breeze (Dark) [Qt], Breeze-Dark [GTK2], Breeze [GTK3] .-===========. =*****+*+ Icons: breeze-dark [Qt], breeze-dark [GTK2/3/4] .-===========: .+*****+: Font: Noto Sans (10pt) [Qt], Noto Sans (10pt) [GTK2/3/4] -=======++++:::::::::::::::::::::::::-: .---: Cursor: breeze (24px) :======++++====+++******************=. Terminal: konsole 24.8.2 :=====+++==========++++++++++++++*- CPU: AMD Ryzen 9 7950X3D (32) @ 5.71 GHz .====++==============++++++++++*- GPU: NVIDIA GeForce RTX 3080 Ti .===+==================+++++++: Memory: 4.08 GiB / 125.42 GiB (3%) .-=======================+++: Swap: 0 B / 125.42 GiB (0%) .......................... Disk (/): 10.55 GiB / 913.85 GiB (1%) - ext4 Local IP (eno1): 192.168.0.4/24 Locale: ru_RU.UTF-8 ```

mtijanic commented 1 month ago

Hi @erars123123 , thanks for that info. Could I ask you to please double-check this claim:

only nvidia.NVreg_RegistryDwords=“RMForcePstate=5” - everything works fine with GSP and pstate=5

There's effectively three modes of the driver:

  1. Proprietary with GSP disabled (requires NVreg_EnableGpuFirmware=0)
  2. Proprietary with GSP enabled (default, or NVreg_EnableGpuFirmware=1)
  3. Open, GSP always enabled.

Above you've claimed that setting RMForcePstate=5 makes (2) and (3) behave differently? Can you please verify that? That would be very surprising as (2) and (3) should behave identically in virtually every way.

You can check whether GSP is active or not with nvidia-smi -q | grep GSP, and you can check whether proprietary or open drivers are in use with modinfo nvidia | grep license.

--

I tried reproducing this on a RTX 3080ti like yours, and I don't see it. I see the pstate go from P8 to P5 occasionally during extremely heavy scrolling in firefox, but I'm not seeing any stutter. Could be just my bad eyes, I'll see if we can devise an objective test somehow.

bn45hkurr0y4 commented 1 month ago

@mtijanic , sorry, I may have used the wrong settings, will do the testing again I'm using a Palit GameRock 3080Ti(not OC)

With GSP enabled

  1. without p-state parameter First Firefox lags when scrolling, after interacting with it for a few all seconds everything is fine until a few seconds pass when the driver switches to standby mode Also windows in KDE sometimes lag (this is noticeable on a 165hz monitor, just like with Firefox and standby mode ). Especially if you move them between the edges of the screens(but sometimes it goes away).
commands

``` ❯ nvidia-smi -q | grep GSP GSP Firmware Version : 565.57.01 ❯ modinfo nvidia | grep license license: NVIDIA ❯ cat /boot/loader/entries/linux-cachyos.conf options root=PARTUUID=58acb6e6-12c7-4035-9fa2-e0b2a33a9113 rw zswap.enabled=0 nowatchdog splash nvidia.NVreg_EnableGpuFirmware=1 ```

  1. with p-state parameter At first I thought that the problem was caused by quotation marks, but no. I had the same problems with p-state=5 parameter. But I solved them by switching to p-state=0. With p-state=3 I lost the problem with Firefox, but still had a problem with moving the window between 2 screens.
commands

``` ❯ nvidia-smi -q | grep GSP GSP Firmware Version : 565.57.01 ❯ modinfo nvidia | grep license license: NVIDIA ❯ cat /boot/loader/entries/linux-cachyos.conf options root=PARTUUID=58acb6e6-12c7-4035-9fa2-e0b2a33a9113 rw zswap.enabled=0 nowatchdog splash nvidia.NVreg_EnableGpuFirmware=1 nvidia.NVreg_RegistryDwords="RMForcePstate=3" ```

Now testing with GSP disabled

  1. without p-state parameter All the same problems with Firefox(gsp=1), BUT! now when moving window between screens everything is normal - p-state switches to 0 and interactions with moving windows are much more pleasant.
commands

``` ❯ nvidia-smi -q | grep GSP GSP Firmware Version : N/A ❯ modinfo nvidia | grep license license: NVIDIA ❯ cat /boot/loader/entries/linux-cachyos.conf options root=PARTUUID=58acb6e6-12c7-4035-9fa2-e0b2a33a9113 rw zswap.enabled=0 nowatchdog splash nvidia.NVreg_EnableGpuFirmware=0 ```

  1. with p-state parameter With p-state = 5 everything works fine
commands

``` ❯ nvidia-smi -q | grep GSP GSP Firmware Version : N/A ❯ modinfo nvidia | grep license license: NVIDIA ❯ cat /boot/loader/entries/linux-cachyos.conf options root=PARTUUID=58acb6e6-12c7-4035-9fa2-e0b2a33a9113 rw zswap.enabled=0 nowatchdog splash nvidia.NVreg_EnableGpuFirmware=0 nvidia.NVreg_RegistryDwords="RMForcePstate=5" ```

Now testing with open drivers

  1. without p-state parameter Same problems as proriertary + gsp=0
commands

``` ❯ nvidia-smi -q | grep GSP GSP Firmware Version : 565.57.01 ❯ modinfo nvidia | grep license license: Dual MIT/GPL ❯ cat /boot/loader/entries/linux-cachyos.conf options root=PARTUUID=58acb6e6-12c7-4035-9fa2-e0b2a33a9113 rw zswap.enabled=0 nowatchdog splash ```

  1. with p-state parameter All the same as proriertary + gsp=1 with p-state=5/3/0
commands

``` ❯ nvidia-smi -q | grep GSP GSP Firmware Version : 565.57.01 ❯ modinfo nvidia | grep license license: Dual MIT/GPL ❯ cat /boot/loader/entries/linux-cachyos.conf options root=PARTUUID=58acb6e6-12c7-4035-9fa2-e0b2a33a9113 rw zswap.enabled=0 nowatchdog splash nvidia.NVreg_RegistryDwords="RMForcePstate=0" ```

Also would like to clarify - if I have p-state=5, will the performance of games and CUDA be degraded?

mtijanic commented 1 month ago

Thanks for this, it is very useful info. When we come up with a more objective test (something that gives out an actual number) we might ask you to try and see what the actual values you see are.

Also would like to clarify - if I have p-state=5, will the performance of games and CUDA be degraded?

Yes, forcing pstate to 5 will prevent it from going to any other, so while you're getting more power than idle, you won't be able to run at full power. Note that the RMForcePstate parameter is entirely undocumented and never meant for production use, it's for our internal testing.

You could use something like this program to boost the pstate: https://gist.github.com/mtijanic/9c129900bfba774b39914ad11b0041f6 But in its current form that is also a hack and not meant for actual production use, but might be enough to get you started. That one forces pstate to 0 while it is running, but you could modify it to use NV2080_CTRL_PERF_BOOST_FLAGS_CMD_BOOST_1LEVEL to get finer grain control. Documentation here: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080perf.h#L53-L102

HBRJZ commented 4 weeks ago

I can confirm all the findings from https://github.com/NVIDIA/open-gpu-kernel-modules/issues/693#issuecomment-2442505734 on a 4080 Super using 560.35.03 on EndeavourOS (Arch Linux) with KDE Plasma Wayland.

Currently using the proprietary driver package with nvidia.NVreg_EnableGpuFirmware=0 nvidia.NVreg_RegistryDwords="RMForcePstate=5" set as a workaround to have a usable desktop experience.

I will retest with 565.57.01 once it hits the stable repo.

System info:

> Operating System: EndeavourOS (Arch Linux) > KDE Plasma Version: 6.2.2 > KDE Frameworks Version: 6.7.0 > Qt Version: 6.8.0 > Kernel Version: 6.11.5-arch1-1 (64-bit) > Graphics Platform: Wayland > Processors: AMD Ryzen 7 7800X3D 8-Core Processor > Memory: 30,9 GiB of RAM > Graphics Processor: NVIDIA GeForce RTX 4080 SUPER (MSI GeForce RTX 4080 SUPER 16G Ventus 3X OC) > Resolution: 2560x1440 @ 75Hz, DisplayPort > Mainboard: ASRock X670E Steel Legend

ptr1337 commented 4 weeks ago

@HBRJZ 565 is in the extra-testing repository, you can manually fetch these packages together with the nvidia-dkms.

Which kind of desktop resolution you are using and also refresh rate?

HBRJZ commented 4 weeks ago

@HBRJZ 565 is in the extra-testing repository, you can manually fetch these packages together with the nvidia-dkms.

Yeah I know, but I will wait until it hits stable. I have done enough testing, switching packages and changing settings this week because of this for now.

Which kind of desktop resolution you are using and also refresh rate?

Resolution is 2560x1440 @ 75Hz connected via DisplayPort. I also have TV connected via HDMI (4k @ 120Hz) for gaming. But that is usally turned off (physically and in the system settings) until I use it.

I have not noticed the same issues while playing games (with nvidia.NVreg_RegistryDwords="RMForcePstate=5" removed), only during desktop use.

HBRJZ commented 4 weeks ago

I tested again since 565.57.01 was pushed to stable yesterday. Things improved quite a bit.

Using the open source kernel module (GSP enabled, no p-state forced):

Using the closed source kernel module (GSP disabled, no p-state forced):

I also tested with nvidia-persistenced enabled but couldn't notice any difference.

Forcing a lower p-state (e.g. nvidia.NVreg_RegistryDwords="RMForcePstate=5") with either module still gets rid of pretty much all these problems for 99% of the time.

Also tried with dynamic triple buffering enabled for KDE Plasma. But that made everything even worse (I guess that is why it's disabled by default for Nvidia).

I just recently switched from an AMD GPU to Nvidia and with the AMD GPU all the problems reported in this issue were not present (albeit other problems, hence why I switched). Electron based applications also seem to generally perform worse on Nvidia than they did on AMD.

Please don't take the last part as a rant (it is not), just an observation. I knew to expect some problems when switching. All the things I specifically wanted to improve by switching were improved, so I don't regret the switch (yet).

HBRJZ commented 3 weeks ago

Running https://www.vsynctester.com/ (with nothing else running) the graph is all over the place with massive spikes, a lot of vsync failures and fluctuating FPS. Eventually the FPS drops from ~74 to 60 to 50 or to 40 leading to the lag / stutter / choppyness until it recovers.

Setting nvidia.NVreg_RegistryDwords="RMForcePstate=5" vastly improves the graph. It's still showing spikes and vsync failures, but less frequently.

The results also differ with the browser used (e.g. Firefox or a Chromium based browser), with Chromium based browsers generally having better results (but the core issues remain).

What seemed to help in my case is changing the refresh rate of my monitor from 75Hz to 60Hz (without also forcing a p-state)

With this the graph is still showing some spikes and vsync failures, but a lot less frequentlly and I have not noticed any big drops in FPS after several hours of desktop usage at 60Hz so far.

So changing the refresh rate of my monitor to 60Hz is a better workaround in my case until this gets fixed properly. Forcing a p-state leads to other issues. For example if I force any p-state I can't set the refresh rate of my TV to anything higher than 60Hz (normally it goes up to 120Hz). Removing that paramater makes that work again.

For comparison:

So this is definitely a Wayland issue which is made even worse when combined with the open source kernel module and the GSP.

There are also already several reports about this over on the Nvidia forum:

gxcreator commented 3 weeks ago

Switching to an X11 session with the Nvidia GPU makes these issues go away too

Not true for me.

As far as I remember some versions of Nvidia Windows drivers keep P-state bumped when multiple high-refresh panels connected. Eventually, users complaining on high power draw, that gets fixed, but then reappears again.

dekomote commented 3 weeks ago

3080ti on 565.57, I observe the same thing with firefox scrolling - It's choppy/slugish with the open kernel module.

While scrolling, the P state sits at 8 at all time. If I run a game or something that will make the P state go up, then firefox scrolling becomes much better.

Switching to the closed driver and GSP Off, makes the scrolling smoother. Probably because the P-state gets bumped to 5 when you scroll, something that wasn't happening with the open driver.

huynhhoanglong commented 1 week ago

Still happens on Nvidia 4060 mobile with Ubuntu 24.10. Hope it will be fixed soon 🙏

urbenlegend commented 1 week ago

I believe I am experiencing the same bug with an up to date Arch KDE and the 565.57.01 open drivers. If you simply start dragging a Dolphin window around for 10 seconds or so with nothing else running, you'll see the window movement stutter and jolt as if it's skipping frames. Even bringing up the application launcher feels stuttery, almost like it is running at 30fps. Unlike @Gert-dev I wasn't able to trigger a ramp up on my GPU simply be swapping desktops. I had to keep a game running in the background and then moving Dolphin around was smooth. This does not happen with the proprietary module with GSP off.

Operating System: Arch Linux KDE Plasma Version: 6.2.3 KDE Frameworks Version: 6.8.0 Qt Version: 6.8.0 Kernel Version: 6.11.9-arch1-1 (64-bit) Graphics Platform: Wayland Processors: 24 × AMD Ryzen 9 3900X 12-Core Processor Memory: 31.3 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 3090/PCIe/SSE2 Product Name: X570 Taichi

DragonSWDev commented 1 week ago

565 seems to improve things a bit on RTX 3060 but there are still framerate drops while proprietary module is smooth most of the time.