Bumblebee-Project / bbswitch

Disable discrete graphics (currently nvidia only)
GNU General Public License v2.0
487 stars 78 forks source link

nvidia GPU not responding after turning off with bbswitch #226

Open nikonikolov opened 4 months ago

nikonikolov commented 4 months ago

I am using bbswitch and bumblebee on my laptop to turn my nvidia GPU on and off to save battery.

The problem is that after turning off the GPU off with bbswitch, the GPU stops responding and can't be turned on again. I am turning it off with:

sudo rmmod nvidia_drm
sudo rmmod nvidia_uvm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
sudo tee /proc/acpi/bbswitch <<<OFF

Afterwards, I turn it on with

sudo tee /proc/acpi/bbswitch <<<ON
sudo modprobe nvidia
sudo modprobe nvidia_modeset
sudo modprobe nvidia_drm
sudo modprobe nvidia_uvm
sudo /usr/bin/nvidia-modprobe -c 0 -u

The commands complete fine, the status inside /proc/acpi/bbswitch says ON, but if I run nvidia-smi I get No devices were found. Similarly I can't run any process on the GPU.

Checking dmesg I can see an obvious problem

[939155.445439] bbswitch: enabling discrete graphics
[939156.023420] pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[939156.182302] nvidia-nvlink: Nvlink Core is being initialized, major device number 505
[939156.182309] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.78  Sun Apr 14 06:35:45 UTC 2024
[939156.261562] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.78  Sun Apr 14 06:23:31 UTC 2024
[939156.287232] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[939156.287236] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[939156.312090] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[939156.340007] nvidia-uvm: Loaded the UVM driver, major device number 503.
[939161.399609] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1556)
[939161.399679] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

dmesg also reports some errors related to turning off the GPU. The log is quite long so I pasted it here

After reboot, the GPU works fine. How do I fix this so I can turn the GPU on and off without reboot?

Extra info Laptop: Dell XPS 15 9530 GPU: Nvidia RTX4060 OS version: Slackware-current NVIDIA driver version: 550.78 kernel version: 6.9.8 (didn't work with earlier kernels either) bbswitch version: 0.8