bayasdev / envycontrol

Easy GPU switching for Nvidia Optimus laptops under Linux
https://bayas.dev/envycontrol
MIT License
1.18k stars 60 forks source link

[BUG] dGPU is not used anymore after nvidia driver update #98

Closed oxfighterjet closed 1 year ago

oxfighterjet commented 1 year ago

Thank you very much for envycontrol, it has worked like a charm until today. I performed a driver update from 525 to 530 today, and since then, when in hybrid mode, none of my games are running on dGPU, only eGPU, despite still having all the launch options set for dGPU and also dGPU being properly reported in lspci, nvtop and nvidia-smi.

To Reproduce

  1. Run sudo envycontrol -s integrated and reboot.
  2. lspci only reports eGPU. dGPU not detected, not usable. Games perform horribly slowly. As expected.
  3. Run sudo envycontrol -s hybrid and reboot.
  4. lspci lists both eGPU and dGPU, and nvtop and nvidia-smi both display dGPU as expected.
  5. Try to run several steam games which have previously been setup with launch options to work with nvidia, but only run at a tenth of the speed and only on eGPU (MangoHUD doesn't even report GPU stats for eGPU, whereas when dGPU is used, stats are visible). This is unexpected.

Expected behavior Game performance was expected to be playable framerate of around 60 FPS for most games (which had been previously setup with those quality settings to hit this framerate as I've been playing those games lately on dGPU) and with the right launch options.

Screenshots If applicable, add screenshots to help explain your problem.

System Information:

Additional context After booting with hybrid mode:

(base) [root@fedxps envycontrol]# dmesg | grep nvidia
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt7)/root/boot/vmlinuz-6.2.8-200.fc37.x86_64 root=UUID=bccfed33-2dc8-4618-b339-d6a97a212c33 ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init resume=UUID=e4b11ca7-7024-407c-a2e1-4aa676e3eba1 rhgb quiet rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init
[    0.051329] Kernel command line: BOOT_IMAGE=(hd0,gpt7)/root/boot/vmlinuz-6.2.8-200.fc37.x86_64 root=UUID=bccfed33-2dc8-4618-b339-d6a97a212c33 ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init resume=UUID=e4b11ca7-7024-407c-a2e1-4aa676e3eba1 rhgb quiet rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init
[    7.958385] nvidia: loading out-of-tree module taints kernel.
[    7.958393] nvidia: module license 'NVIDIA' taints kernel.
[    7.967723] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    8.055841] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[    8.056505] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[    8.196095] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[    8.294023] nvidia-uvm: Loaded the UVM driver, major device number 507.
[    8.341617] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  530.41.03  Thu Mar 16 19:23:04 UTC 2023
[    8.346936] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    9.026013] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[root@fedxps envycontrol]# akmods
Checking kmods exist for 6.2.8-200.fc37.x86_64             [  OK  ]
[root@fedxps envycontrol]# python envycontrol.py -s hybrid --verbose
Switching to hybrid mode
Enable PCI-Express Runtime D3 (RTD3) Power Management: False
INFO: Removed file /lib/udev/rules.d/80-nvidia-pm.rules
INFO: Removed file /etc/modprobe.d/nvidia.conf
INFO: Created file /etc/modprobe.d/nvidia.conf
# Automatically generated by EnvyControl

options nvidia-drm modeset=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1

Rebuilding the initramfs...
Successfully rebuilt the initramfs!
Operation completed successfully
Please reboot your computer for changes to take effect!

Not sure if this is relevant in this case, but I've seen you request this before so I'm adding it for completeness:

(base) [root@fedxps envycontrol]$ xrandr --listproviders
Providers: number : 0

I suppose downgrading nvidia proprietary drivers would be possible to see if it fixes the issue, but I prefer exhausting all troubleshooting steps first, if there are some you can suggest.

Edit: just specified that the lspci output was in hybrid mode. Edit2: Added verbose output for envycontrol hybrid setup.

bayasdev commented 1 year ago

I don't understand what's your problem, integrated mode should disable the Nvidia dGPU however envycontrol does not support eGPU setups.

Also Nvidia + Wayland is known for performance problems on hybrid graphics.

oxfighterjet commented 1 year ago

Indeed, there is no issue in integrated mode. The issue is in hybrid mode, where the dedicated GPU is visibly active but the games do not run on it, despite the appropriate launch options.

I'm surprised about your comment, I have used my setup without issues in the past, on this exact machine, things have only stopped working since the driver update. Are there no troubleshooting steps I can attempt to identify what the cause might be?

bayasdev commented 1 year ago

Indeed, there is no issue in integrated mode. The issue is in hybrid mode, where the dedicated GPU is visibly active but the games do not run on it, despite the appropriate launch options.

Then it's a regression on the Nvidia side of things

I'm surprised about your comment, I have used my setup without issues in the past, on this exact machine, things have only stopped working since the driver update. Are there no troubleshooting steps I can attempt to identify what the cause might be?

Well, hybrid mode does absolutely nothing but setting up some kernel parameters (like enabling modeset, stuff to prevent VRAM corruption after waking from suspend and RTD3 if specified by the user) since PRIME render offload is the default behavior of the Nvidia drivers [1].

However you can perform basic troubleshooting steps like downgrading or reinstalling the drivers.

[1] http://us.download.nvidia.com/XFree86/Linux-x86_64/530.41.03/README/primerenderoffload.html

oxfighterjet commented 1 year ago

I understand. It seems indeed other people are facing issues with those drivers. From what I can tell on nvidia's website, those drivers aren't available anymore. Hopefully they provide new drivers to fix these issues soon. For the time being, no other drivers than the 530 are available on the rpmfusion-nonfree repo so I'll need to be patient. Thank you for your time and effort.

EDIT: While other drivers are at the moment not available on rpmfusion-nonfree, I managed to find another repo called negativo17 which contained the 525.89.02 drivers which were previously working. It solved everything. In case it interests anyone else:

dnf config-manager --add-repo=https://negativo17.org/repos/fedora-nvidia.repo

dnf install akmod-nvidia-3:525.89.02-1.fc37 dkms-nvidia-3:525.89.02-1.fc37 nvidia-kmod-common-525.89.02-2.fc37