Bumblebee-Project / bbswitch

Disable discrete graphics (currently nvidia only)
GNU General Public License v2.0
487 stars 78 forks source link

GPU won't turn off after waking up from sleep #107

Open tohyf opened 9 years ago

tohyf commented 9 years ago

I am running Kubuntu 14.04 with kernel 3.16 , on a ASUS laptop with geforce 840M. After booting up, the GPU is OFF, which is desired; but after I put it to sleep (suspend to RAM) and then wakes it up, bbswitch is unable to turn it off again. Thus, the battery life cuts to half! This bug is reproducible every time. I only installed bbswtich-dkms and not bumblebee. FYI: One line of lspci command, after booting: 04:00.0 3D controller: NVIDIA Corporation Device 1341 (rev ff) After waking up from sleep, it becomes: 04:00.0 3D controller: NVIDIA Corporation Device 1341 (rev a2)

The command that I used to turn the card off: sudo tee /proc/acpi/bbswitch <<<OFF

I have added the following line into the /etc/modules: bbswitch load_state=0 unload_state=0

What I want is, I want the card to be OFF all the time unless I want it. Can you look into this problem? Thanks.

Lekensteyn commented 9 years ago

Ensure that the nvidia module is not loaded at all, by blacklisting or not installing it. Any dmesg messages?

tohyf commented 9 years ago

Nvidia and the nouveau is not loaded at all. Please see the following output from lsmod:

$ lsmod
Module                  Size  Used by
ctr                    13193  2 
ccm                    17856  2 
hid_generic            12559  0 
usbhid                 53155  0 
pci_stub               12622  1 
vboxpci                23236  0 
vboxnetadp             25670  0 
vboxnetflt             27612  0 
vboxdrv               409636  3 vboxpci,vboxnetadp,vboxnetflt
iptable_filter         12810  0 
ip_tables              27718  1 iptable_filter
x_tables               34103  2 iptable_filter,ip_tables
bnep                   23980  2 
rfcomm                 75100  0 
parport_pc             32909  0 
ppdev                  17711  0 
binfmt_misc            18163  1 
nls_iso8859_1          12713  1 
intel_rapl             19714  0 
iosf_mbi               13865  1 intel_rapl
x86_pkg_temp_thermal    14312  0 
intel_powerclamp       19099  0 
coretemp               13638  0 
arc4                   12573  2 
ath9k                 153350  0 
kvm_intel             154139  0 
mac80211              751853  1 ath9k
kvm                   480978  1 kvm_intel
snd_soc_rt5640         93325  0 
ath9k_common           31923  1 ath9k
snd_soc_core          196850  1 snd_soc_rt5640
ath9k_hw              469139  2 ath9k,ath9k_common
crct10dif_pclmul       14268  0 
snd_hda_codec_realtek    80490  1 
joydev                 17587  0 
crc32_pclmul           13180  0 
ath                    29397  3 ath9k,ath9k_common,ath9k_hw
snd_hda_codec_generic    69995  1 snd_hda_codec_realtek
snd_pcm_dmaengine      15229  1 snd_soc_core
snd_hda_codec_hdmi     52670  1 
snd_hda_intel          30824  5 
ghash_clmulni_intel    13230  0 
snd_compress           19395  1 snd_soc_core
snd_hda_controller     36330  1 snd_hda_intel
snd_hda_codec         144641  5 snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_controller
snd_soc_rl6231         13037  1 snd_soc_rt5640
snd_hwdep              17709  1 snd_hda_codec
aesni_intel           169686  4 
uvcvideo               92591  0 
ablk_helper            13597  1 aesni_intel
videobuf2_core         51547  1 uvcvideo
v4l2_common            14871  1 videobuf2_core
cryptd                 20531  3 ghash_clmulni_intel,aesni_intel,ablk_helper
lrw                    13323  1 aesni_intel
snd_pcm               106365  7 snd_soc_rt5640,snd_soc_core,snd_pcm_dmaengine,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_controller,snd_hda_codec
videodev              163821  3 uvcvideo,videobuf2_core,v4l2_common
snd_seq_midi           13564  0 
rtsx_usb_ms            19050  0 
i2c_hid                19065  0 
media                  22129  2 uvcvideo,videodev
hid                   110883  3 hid_generic,usbhid,i2c_hid
snd_rawmidi            31197  1 snd_seq_midi
snd_seq_midi_event     14899  1 snd_seq_midi
gf128mul               14951  1 lrw
snd_seq                63540  2 snd_seq_midi,snd_seq_midi_event
videobuf2_vmalloc      13841  1 uvcvideo
snd_timer              30118  2 snd_pcm,snd_seq
videobuf2_memops       13362  1 videobuf2_vmalloc
asus_nb_wmi            21128  0 
i2c_designware_platform    13025  0 
ath3k                  13381  0 
btusb                  32691  0 
cfg80211              551291  4 ath9k,mac80211,ath9k_common,ath
snd_seq_device         14875  3 snd_seq_midi,snd_rawmidi,snd_seq
memstick               16968  1 rtsx_usb_ms
glue_helper            14095  1 aesni_intel
asus_wmi               24697  1 asus_nb_wmi
aes_x86_64             17131  1 aesni_intel
bluetooth             510653  12 bnep,rfcomm,ath3k,btusb
psmouse               118431  0 
dw_dmac                12835  0 
dw_dmac_core           28558  1 dw_dmac
spi_pxa2xx_platform    23453  0 
mac_hid                13275  0 
serio_raw              13483  0 
snd_soc_sst_acpi       13007  0 
sparse_keymap          13890  1 asus_wmi
i2c_designware_core    14990  1 i2c_designware_platform
snd                    84025  23 snd_soc_core,snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_compress,snd_hda_codec,snd_hwdep,snd_pcm,snd_rawmidi,snd_seq,snd_timer,snd_seq_device
8250_dw                13474  0 
mei_me                 19610  0 
mei                    88864  1 mei_me
shpchp                 37216  0 
soundcore              15091  2 snd_hda_codec,snd
int3400_thermal        13345  0 
int3402_thermal        13060  0 
acpi_thermal_rel       13807  1 int3400_thermal
processor_thermal_device    14192  0 
lpc_ich                21176  0 
bbswitch               13931  0 
lp                     17799  0 
parport                42481  3 parport_pc,ppdev,lp
rtsx_usb_sdmmc         28381  0 
rtsx_usb               21330  2 rtsx_usb_ms,rtsx_usb_sdmmc
mxm_wmi                13021  0 
r8169                  87016  0 
mii                    13981  1 r8169
sdhci_acpi             13502  0 
sdhci                  44021  1 sdhci_acpi
i915                 1087482  6 
drm_kms_helper        119701  1 i915
drm                   341532  5 i915,drm_kms_helper
video                  24803  2 asus_wmi,i915
i2c_algo_bit           13564  1 i915
wmi                    19379  2 asus_wmi,mxm_wmi
ahci                   34220  6 
libahci                32353  1 ahci
ArchangeGabriel commented 9 years ago

Please post dmesg output after resume.

Lekensteyn commented 9 years ago

bbswitch works for me after suspend/resume. Please re-open if the issue still occurs.

tohyf commented 8 years ago

Sorry for the long delay, was busy for a long time before I can settle down on my laptop :P Happy New Year ! Requesting to reopen the issue... I have captured the dmesg and lscpi output, both before and after the sleep. Note that I have modified the dmesg log by removing some info like my MAC addresses. You can use diff to compare between both of them as the "after-sleep" log also contains the logs before the sleep, I just captured it as-is. I have captured them immediately after a fresh reboot. It can be seen that before sleep, the lspci output shows the GPU is disabled properly. However, after reboot, the GPU is enabled. Even after issuing OFF to bbswitch manually it will still be on.

before_sleep_pci.txt before_sleep_logs.txt

after_sleep_pci.txt after_sleep_logs.txt

ArchangeGabriel commented 8 years ago

I’m not really surprised suspend is broken for some people, they have ever been issues with suspend under Linux…

The only things I can see are those lines:

[  366.694543] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.694669] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  366.695551] pci_bus 0000:01: Allocating resources
[  366.695570] pcieport 0000:00:1c.0: bridge window [io  0x1000-0x0fff] to [bus 01] add_size 1000
[  366.695577] pcieport 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 01] add_size 200000
[  366.695581] pcieport 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff] to [bus 01] add_size 200000
[  366.695590] pci_bus 0000:02: Allocating resources
[  366.695608] pcieport 0000:00:1c.2: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 02] add_size 200000
[  366.695617] pci_bus 0000:03: Allocating resources
[  366.695633] pcieport 0000:00:1c.3: bridge window [io  0x1000-0x0fff] to [bus 03] add_size 1000
[  366.695638] pcieport 0000:00:1c.3: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03] add_size 200000
[  366.695646] pci_bus 0000:04: Allocating resources
[  366.695664] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.695674] pcieport 0000:00:1c.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000
[  366.695678] pcieport 0000:00:1c.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[  366.695682] pcieport 0000:00:1c.2: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[  366.695685] pcieport 0000:00:1c.3: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[  366.695689] pcieport 0000:00:1c.0: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
[  366.695693] pcieport 0000:00:1c.3: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
[  366.695706] pcieport 0000:00:1c.0: BAR 14: assigned [mem 0xcfe00000-0xcfffffff]
[  366.695724] pcieport 0000:00:1c.0: BAR 15: assigned [mem 0xf2000000-0xf21fffff 64bit pref]
[  366.695739] pcieport 0000:00:1c.2: BAR 15: assigned [mem 0xf2200000-0xf23fffff 64bit pref]
[  366.695752] pcieport 0000:00:1c.3: BAR 15: assigned [mem 0xf2400000-0xf25fffff 64bit pref]
[  366.695760] pcieport 0000:00:1c.0: BAR 13: assigned [io  0x2000-0x2fff]
[  366.695765] pcieport 0000:00:1c.3: BAR 13: assigned [io  0x3000-0x3fff]
[  366.695856] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  366.697567] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.697697] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  366.698063] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.698189] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  366.698938] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.699062] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  366.699172] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.699287] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  366.699505] acpi device:63: Cannot transition to power state D3cold for parent in (unknown)
[  366.699743] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.699827] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[  366.700228] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  366.700309] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug

Apart from an issue on i915 side, we have this line reported several times:

pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug

Could you try what is suggested? Also, have you tried with nouveau (disabling bumblebeed/bbswitch, letting nouveau load on boot and handle PM)?

tohyf commented 8 years ago

Update: now using Kubuntu 14.04 and kernel 3.16.0. Try what is suggested? What is your suggestion again? Didn't catch it. For nouveau, I might try again(thought that would be default before I install bbswitch?).

ArchangeGabriel commented 8 years ago

Edit /etc/default/grub and add this to GRUB_CMD_LINE: pci=pcie_bus_safe

For nouveau, this should indeed be the default unless you install bumblebee/bbswitch et al.

ArchangeGabriel commented 8 years ago

Oh, and after editing the GRUB file, you need to launch some grub-* command to update you configuration. Then reboot, retry and provide new logs.

tohyf commented 8 years ago

Ok, now i will remove bbswitch-dkms and bumblebee package, and perform what you mentioned above.

ArchangeGabriel commented 8 years ago

Sorry, I should have been more precise. The modification to GRUB is to be tested with bbswitch.

nouveau should be tried without it.

tohyf commented 8 years ago

Do you want me to add another line of GRUB_CMD_LINE or add to this existing one, like this: GRUB_CMDLINE_LINUX="pcie=pcie_bus_safe"

tohyf commented 8 years ago

Oh I see, then should undo what I do to the modprobe.d blacklist (nouveau), reinstall bbswitch and bumblebee, then use the GRUB command line modification, right? Alternatively, just using nouveau without bbswitch, bumblebee and any modification on GRUB defaults ...

tohyf commented 8 years ago

Using the following boot param:

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.16.0-57-generic.efi.signed root=UUID=c0c588a9-xxxx-xxxx-8254-fcfbf0af8c2d ro pcie=pcie_bus_safe quiet splash vt.handoff=7

Here are the dmesg and lspci outputs... Seems like it is enabled after wake up after_sleep_pci.txt after_sleep_logs.txt before_sleep_pci.txt before_sleep_logs.txt

ArchangeGabriel commented 8 years ago

Hum forget about nouveau, you have a Maxwell GPU so it can’t handle it. Note you probably forgot to reblacklist nouveau or rebuild initramfs after doing so because it tried to load at boot.

Anyway, this doesn’t seems to help (so you can remove the pcie= thing). I’m clueless here and people at freedesktop will probably not want to help us on this one. You might eventually open a bug report there with the lines I’ve quoted before, telling you have those messages everytime you go to suspend.

tohyf commented 8 years ago

If I were to file this bug on freedesktop.org, under which category/application that I should file ?

ArchangeGabriel commented 8 years ago

Sorry for the delay. Now that Maxwell GPU are supported by nouveau (in fact it seems they were before, except for GL), you could try it.

Depending on what it gives, you might file a bug at https://bugzilla.kernel.org/ against Drivers/PCI I think. Or just against PM/Suspend. Anyway, kernel devs will probably be able to set the right category if you’re not too far at first.

Lekensteyn commented 8 years ago

@tohyf Can you run sudo acpidump > acpidump.txt and attach that to this issue?

tohyf commented 8 years ago

acpidump.txt

Lekensteyn commented 8 years ago

Thanks, confirmed that this is a problem that is being worked on with the acpi-pr3 branch.

tohyf commented 8 years ago

Ok great, will wait for good news.

On 05/31/2016 04:55 AM, Peter Wu wrote:

Thanks, confirmed that this is a problem that is being worked on with the acpi-pr3 branch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bumblebee-Project/bbswitch/issues/107#issuecomment-222553474, or mute the thread https://github.com/notifications/unsubscribe/ACqro-g8DhyaNaCUTztsOCoBVEYyTUMHks5qG07BgaJpZM4DqE0z.

tohyf commented 7 years ago

I found a workaround that works. It involves manually set some PCIe configuration registers to put the device and rootport to D3 state. (Disclaimer to other people suffering similar issue: Do not try this without proper knowledge of PCIe, it may cause system crash and reset etc ! ) Here's what I did: lspci tells me that my graphics device is located at Bus:Device.Function of 04:0.0 : 04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 840M] (rev a2) Then, find out which pcie root port that the card is connected to:

~$ lspci -vv | grep -B5 'secondary=04'
00:1c.4 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 5 (rev e4) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0

So I know the corresponding PCIe root port is 0:1c.4. Then, according to the PCIe spec, i have to put the downstream device to D3 state first. Let's check the current state of the graphics card (apparently is D0):

~$ sudo lspci -s 04:00.0 -vv | grep 'Status: D'
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

To put it to D3 state, we have to modify the PMCSR register located in the Power Management Capability Structure (CAP_PM), at offset 4:

# The device power state is the two lower bits, let's see how are the other bits looks like
~$ sudo setpci -s 04:00.0 CAP_PM+4.b
08
# Now we set it to D3 state by setting two lower bits to 11
~$ sudo setpci -s 04:00.0 CAP_PM+4.b=0b
# Confirm that it is being set : 
~$ sudo setpci -s 04:00.0 CAP_PM+4.b
0b

At this point, the device is still accessible:

~$ cat /proc/acpi/bbswitch
0000:04:00.0 ON
So the next step is to set the root port to D3 state: 
~$ sudo setpci -s 00:1c.4 CAP_PM+4.b
00
~$ sudo setpci -s 00:1c.4 CAP_PM+4.b=03
~$ sudo setpci -s 00:1c.4 CAP_PM+4.b
03

FINALLY, the card is OFF !!!

~$ cat /proc/acpi/bbswitch
0000:04:00.0 OFF
~$ sudo lspci -s 04:00.0 -vv
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 840M] (rev ff) (prog-if ff)
        !!! Unknown header type 7f

Ok, after some time, i realized that the laptop is still very hot without CPU load, apparently the GPU is still on .... Any opinions ??

Lekensteyn commented 7 years ago

This is likely missing some help from ACPI, that's why it appears off but still generates heat. Try kernel 4.9 or newer. You don't even need to load any driver (just enable runtime PM for the nvidia device and the pcie root port). If you want to use the Nvidia GPU (e.g. because you have a HDMI/DP port attached to it), use nouveau (which by default enables runtime PM).

tohyf commented 5 years ago

Fast-forward to ubuntu 18.04 running kernel 4.15, the issue still doesn't go away. Tried both nouveau and nvidia-390 driver but still having the excessive heat issue. Is there any other updates lately?

mihkel-t commented 4 years ago

@tohyf, have you tried if this still happens for you if you stop Bumblebee service when suspending the system? See https://github.com/Bumblebee-Project/bbswitch/issues/90#issuecomment-560163337

tohyf commented 4 years ago

Hmm, that looks like a possible solution. Let me try it and report back. Thanks!

tohyf commented 4 years ago

Nope, I tried the solution but it doesn't work. The card is ON and cannot be turned off after waking from sleep using the bbswitch command: sudo tee /proc/acpi/bbswitch <<<OFF Note that in my Ubuntu 18.04 there was no /usr/lib/systemd/system-sleep directory but there was /lib/systemd/system-sleep/, so I created the file 00-bumblebee.sh there. There was no /usr/lib/systemd/system-sleep/nvidia or /lib/systemd/system-sleep/nvidia in my system (i have both nvidia-driver-435 and bumblebee installed from the ubuntu repo) However, i have the nvidia and nouveau kernel modules blacklisted since i just want to save power. I assume these won't have effect, right?