Open tohyf opened 9 years ago
Ensure that the nvidia module is not loaded at all, by blacklisting or not installing it. Any dmesg messages?
Nvidia and the nouveau is not loaded at all. Please see the following output from lsmod:
$ lsmod
Module Size Used by
ctr 13193 2
ccm 17856 2
hid_generic 12559 0
usbhid 53155 0
pci_stub 12622 1
vboxpci 23236 0
vboxnetadp 25670 0
vboxnetflt 27612 0
vboxdrv 409636 3 vboxpci,vboxnetadp,vboxnetflt
iptable_filter 12810 0
ip_tables 27718 1 iptable_filter
x_tables 34103 2 iptable_filter,ip_tables
bnep 23980 2
rfcomm 75100 0
parport_pc 32909 0
ppdev 17711 0
binfmt_misc 18163 1
nls_iso8859_1 12713 1
intel_rapl 19714 0
iosf_mbi 13865 1 intel_rapl
x86_pkg_temp_thermal 14312 0
intel_powerclamp 19099 0
coretemp 13638 0
arc4 12573 2
ath9k 153350 0
kvm_intel 154139 0
mac80211 751853 1 ath9k
kvm 480978 1 kvm_intel
snd_soc_rt5640 93325 0
ath9k_common 31923 1 ath9k
snd_soc_core 196850 1 snd_soc_rt5640
ath9k_hw 469139 2 ath9k,ath9k_common
crct10dif_pclmul 14268 0
snd_hda_codec_realtek 80490 1
joydev 17587 0
crc32_pclmul 13180 0
ath 29397 3 ath9k,ath9k_common,ath9k_hw
snd_hda_codec_generic 69995 1 snd_hda_codec_realtek
snd_pcm_dmaengine 15229 1 snd_soc_core
snd_hda_codec_hdmi 52670 1
snd_hda_intel 30824 5
ghash_clmulni_intel 13230 0
snd_compress 19395 1 snd_soc_core
snd_hda_controller 36330 1 snd_hda_intel
snd_hda_codec 144641 5 snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_controller
snd_soc_rl6231 13037 1 snd_soc_rt5640
snd_hwdep 17709 1 snd_hda_codec
aesni_intel 169686 4
uvcvideo 92591 0
ablk_helper 13597 1 aesni_intel
videobuf2_core 51547 1 uvcvideo
v4l2_common 14871 1 videobuf2_core
cryptd 20531 3 ghash_clmulni_intel,aesni_intel,ablk_helper
lrw 13323 1 aesni_intel
snd_pcm 106365 7 snd_soc_rt5640,snd_soc_core,snd_pcm_dmaengine,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_controller,snd_hda_codec
videodev 163821 3 uvcvideo,videobuf2_core,v4l2_common
snd_seq_midi 13564 0
rtsx_usb_ms 19050 0
i2c_hid 19065 0
media 22129 2 uvcvideo,videodev
hid 110883 3 hid_generic,usbhid,i2c_hid
snd_rawmidi 31197 1 snd_seq_midi
snd_seq_midi_event 14899 1 snd_seq_midi
gf128mul 14951 1 lrw
snd_seq 63540 2 snd_seq_midi,snd_seq_midi_event
videobuf2_vmalloc 13841 1 uvcvideo
snd_timer 30118 2 snd_pcm,snd_seq
videobuf2_memops 13362 1 videobuf2_vmalloc
asus_nb_wmi 21128 0
i2c_designware_platform 13025 0
ath3k 13381 0
btusb 32691 0
cfg80211 551291 4 ath9k,mac80211,ath9k_common,ath
snd_seq_device 14875 3 snd_seq_midi,snd_rawmidi,snd_seq
memstick 16968 1 rtsx_usb_ms
glue_helper 14095 1 aesni_intel
asus_wmi 24697 1 asus_nb_wmi
aes_x86_64 17131 1 aesni_intel
bluetooth 510653 12 bnep,rfcomm,ath3k,btusb
psmouse 118431 0
dw_dmac 12835 0
dw_dmac_core 28558 1 dw_dmac
spi_pxa2xx_platform 23453 0
mac_hid 13275 0
serio_raw 13483 0
snd_soc_sst_acpi 13007 0
sparse_keymap 13890 1 asus_wmi
i2c_designware_core 14990 1 i2c_designware_platform
snd 84025 23 snd_soc_core,snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_compress,snd_hda_codec,snd_hwdep,snd_pcm,snd_rawmidi,snd_seq,snd_timer,snd_seq_device
8250_dw 13474 0
mei_me 19610 0
mei 88864 1 mei_me
shpchp 37216 0
soundcore 15091 2 snd_hda_codec,snd
int3400_thermal 13345 0
int3402_thermal 13060 0
acpi_thermal_rel 13807 1 int3400_thermal
processor_thermal_device 14192 0
lpc_ich 21176 0
bbswitch 13931 0
lp 17799 0
parport 42481 3 parport_pc,ppdev,lp
rtsx_usb_sdmmc 28381 0
rtsx_usb 21330 2 rtsx_usb_ms,rtsx_usb_sdmmc
mxm_wmi 13021 0
r8169 87016 0
mii 13981 1 r8169
sdhci_acpi 13502 0
sdhci 44021 1 sdhci_acpi
i915 1087482 6
drm_kms_helper 119701 1 i915
drm 341532 5 i915,drm_kms_helper
video 24803 2 asus_wmi,i915
i2c_algo_bit 13564 1 i915
wmi 19379 2 asus_wmi,mxm_wmi
ahci 34220 6
libahci 32353 1 ahci
Please post dmesg output after resume.
bbswitch works for me after suspend/resume. Please re-open if the issue still occurs.
Sorry for the long delay, was busy for a long time before I can settle down on my laptop :P Happy New Year ! Requesting to reopen the issue... I have captured the dmesg and lscpi output, both before and after the sleep. Note that I have modified the dmesg log by removing some info like my MAC addresses. You can use diff to compare between both of them as the "after-sleep" log also contains the logs before the sleep, I just captured it as-is. I have captured them immediately after a fresh reboot. It can be seen that before sleep, the lspci output shows the GPU is disabled properly. However, after reboot, the GPU is enabled. Even after issuing OFF to bbswitch manually it will still be on.
I’m not really surprised suspend is broken for some people, they have ever been issues with suspend under Linux…
The only things I can see are those lines:
[ 366.694543] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.694669] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[ 366.695551] pci_bus 0000:01: Allocating resources
[ 366.695570] pcieport 0000:00:1c.0: bridge window [io 0x1000-0x0fff] to [bus 01] add_size 1000
[ 366.695577] pcieport 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 01] add_size 200000
[ 366.695581] pcieport 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff] to [bus 01] add_size 200000
[ 366.695590] pci_bus 0000:02: Allocating resources
[ 366.695608] pcieport 0000:00:1c.2: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 02] add_size 200000
[ 366.695617] pci_bus 0000:03: Allocating resources
[ 366.695633] pcieport 0000:00:1c.3: bridge window [io 0x1000-0x0fff] to [bus 03] add_size 1000
[ 366.695638] pcieport 0000:00:1c.3: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03] add_size 200000
[ 366.695646] pci_bus 0000:04: Allocating resources
[ 366.695664] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.695674] pcieport 0000:00:1c.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000
[ 366.695678] pcieport 0000:00:1c.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[ 366.695682] pcieport 0000:00:1c.2: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[ 366.695685] pcieport 0000:00:1c.3: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[ 366.695689] pcieport 0000:00:1c.0: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000
[ 366.695693] pcieport 0000:00:1c.3: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000
[ 366.695706] pcieport 0000:00:1c.0: BAR 14: assigned [mem 0xcfe00000-0xcfffffff]
[ 366.695724] pcieport 0000:00:1c.0: BAR 15: assigned [mem 0xf2000000-0xf21fffff 64bit pref]
[ 366.695739] pcieport 0000:00:1c.2: BAR 15: assigned [mem 0xf2200000-0xf23fffff 64bit pref]
[ 366.695752] pcieport 0000:00:1c.3: BAR 15: assigned [mem 0xf2400000-0xf25fffff 64bit pref]
[ 366.695760] pcieport 0000:00:1c.0: BAR 13: assigned [io 0x2000-0x2fff]
[ 366.695765] pcieport 0000:00:1c.3: BAR 13: assigned [io 0x3000-0x3fff]
[ 366.695856] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[ 366.697567] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.697697] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[ 366.698063] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.698189] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[ 366.698938] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.699062] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[ 366.699172] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.699287] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[ 366.699505] acpi device:63: Cannot transition to power state D3cold for parent in (unknown)
[ 366.699743] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.699827] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
[ 366.700228] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 366.700309] pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
Apart from an issue on i915 side, we have this line reported several times:
pci 0000:04:00.0: Max Payload Size 16384, but upstream 0000:00:1c.4 set to 128; if necessary, use "pci=pcie_bus_safe" and report a bug
Could you try what is suggested? Also, have you tried with nouveau (disabling bumblebeed/bbswitch, letting nouveau load on boot and handle PM)?
Update: now using Kubuntu 14.04 and kernel 3.16.0. Try what is suggested? What is your suggestion again? Didn't catch it. For nouveau, I might try again(thought that would be default before I install bbswitch?).
Edit /etc/default/grub
and add this to GRUB_CMD_LINE
: pci=pcie_bus_safe
For nouveau, this should indeed be the default unless you install bumblebee/bbswitch et al.
Oh, and after editing the GRUB file, you need to launch some grub-* command to update you configuration. Then reboot, retry and provide new logs.
Ok, now i will remove bbswitch-dkms and bumblebee package, and perform what you mentioned above.
Sorry, I should have been more precise. The modification to GRUB is to be tested with bbswitch.
nouveau should be tried without it.
Do you want me to add another line of GRUB_CMD_LINE or add to this existing one, like this: GRUB_CMDLINE_LINUX="pcie=pcie_bus_safe"
Oh I see, then should undo what I do to the modprobe.d blacklist (nouveau), reinstall bbswitch and bumblebee, then use the GRUB command line modification, right? Alternatively, just using nouveau without bbswitch, bumblebee and any modification on GRUB defaults ...
Using the following boot param:
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.16.0-57-generic.efi.signed root=UUID=c0c588a9-xxxx-xxxx-8254-fcfbf0af8c2d ro pcie=pcie_bus_safe quiet splash vt.handoff=7
Here are the dmesg and lspci outputs... Seems like it is enabled after wake up after_sleep_pci.txt after_sleep_logs.txt before_sleep_pci.txt before_sleep_logs.txt
Hum forget about nouveau, you have a Maxwell GPU so it can’t handle it. Note you probably forgot to reblacklist nouveau or rebuild initramfs after doing so because it tried to load at boot.
Anyway, this doesn’t seems to help (so you can remove the pcie= thing). I’m clueless here and people at freedesktop will probably not want to help us on this one. You might eventually open a bug report there with the lines I’ve quoted before, telling you have those messages everytime you go to suspend.
If I were to file this bug on freedesktop.org, under which category/application that I should file ?
Sorry for the delay. Now that Maxwell GPU are supported by nouveau (in fact it seems they were before, except for GL), you could try it.
Depending on what it gives, you might file a bug at https://bugzilla.kernel.org/ against Drivers/PCI I think. Or just against PM/Suspend. Anyway, kernel devs will probably be able to set the right category if you’re not too far at first.
@tohyf Can you run sudo acpidump > acpidump.txt
and attach that to this issue?
Thanks, confirmed that this is a problem that is being worked on with the acpi-pr3 branch.
Ok great, will wait for good news.
On 05/31/2016 04:55 AM, Peter Wu wrote:
Thanks, confirmed that this is a problem that is being worked on with the acpi-pr3 branch.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bumblebee-Project/bbswitch/issues/107#issuecomment-222553474, or mute the thread https://github.com/notifications/unsubscribe/ACqro-g8DhyaNaCUTztsOCoBVEYyTUMHks5qG07BgaJpZM4DqE0z.
I found a workaround that works. It involves manually set some PCIe configuration registers to put the device and rootport to D3 state. (Disclaimer to other people suffering similar issue: Do not try this without proper knowledge of PCIe, it may cause system crash and reset etc ! ) Here's what I did: lspci tells me that my graphics device is located at Bus:Device.Function of 04:0.0 : 04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 840M] (rev a2) Then, find out which pcie root port that the card is connected to:
~$ lspci -vv | grep -B5 'secondary=04'
00:1c.4 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 5 (rev e4) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
So I know the corresponding PCIe root port is 0:1c.4. Then, according to the PCIe spec, i have to put the downstream device to D3 state first. Let's check the current state of the graphics card (apparently is D0):
~$ sudo lspci -s 04:00.0 -vv | grep 'Status: D'
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
To put it to D3 state, we have to modify the PMCSR register located in the Power Management Capability Structure (CAP_PM), at offset 4:
# The device power state is the two lower bits, let's see how are the other bits looks like
~$ sudo setpci -s 04:00.0 CAP_PM+4.b
08
# Now we set it to D3 state by setting two lower bits to 11
~$ sudo setpci -s 04:00.0 CAP_PM+4.b=0b
# Confirm that it is being set :
~$ sudo setpci -s 04:00.0 CAP_PM+4.b
0b
At this point, the device is still accessible:
~$ cat /proc/acpi/bbswitch
0000:04:00.0 ON
So the next step is to set the root port to D3 state:
~$ sudo setpci -s 00:1c.4 CAP_PM+4.b
00
~$ sudo setpci -s 00:1c.4 CAP_PM+4.b=03
~$ sudo setpci -s 00:1c.4 CAP_PM+4.b
03
FINALLY, the card is OFF !!!
~$ cat /proc/acpi/bbswitch
0000:04:00.0 OFF
~$ sudo lspci -s 04:00.0 -vv
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 840M] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Ok, after some time, i realized that the laptop is still very hot without CPU load, apparently the GPU is still on .... Any opinions ??
This is likely missing some help from ACPI, that's why it appears off but still generates heat. Try kernel 4.9 or newer. You don't even need to load any driver (just enable runtime PM for the nvidia device and the pcie root port). If you want to use the Nvidia GPU (e.g. because you have a HDMI/DP port attached to it), use nouveau (which by default enables runtime PM).
Fast-forward to ubuntu 18.04 running kernel 4.15, the issue still doesn't go away. Tried both nouveau and nvidia-390 driver but still having the excessive heat issue. Is there any other updates lately?
@tohyf, have you tried if this still happens for you if you stop Bumblebee service when suspending the system? See https://github.com/Bumblebee-Project/bbswitch/issues/90#issuecomment-560163337
Hmm, that looks like a possible solution. Let me try it and report back. Thanks!
Nope, I tried the solution but it doesn't work. The card is ON and cannot be turned off after waking from sleep using the bbswitch command: sudo tee /proc/acpi/bbswitch <<<OFF Note that in my Ubuntu 18.04 there was no /usr/lib/systemd/system-sleep directory but there was /lib/systemd/system-sleep/, so I created the file 00-bumblebee.sh there. There was no /usr/lib/systemd/system-sleep/nvidia or /lib/systemd/system-sleep/nvidia in my system (i have both nvidia-driver-435 and bumblebee installed from the ubuntu repo) However, i have the nvidia and nouveau kernel modules blacklisted since i just want to save power. I assume these won't have effect, right?
I am running Kubuntu 14.04 with kernel 3.16 , on a ASUS laptop with geforce 840M. After booting up, the GPU is OFF, which is desired; but after I put it to sleep (suspend to RAM) and then wakes it up, bbswitch is unable to turn it off again. Thus, the battery life cuts to half! This bug is reproducible every time. I only installed bbswtich-dkms and not bumblebee. FYI: One line of lspci command, after booting: 04:00.0 3D controller: NVIDIA Corporation Device 1341 (rev ff) After waking up from sleep, it becomes: 04:00.0 3D controller: NVIDIA Corporation Device 1341 (rev a2)
The command that I used to turn the card off: sudo tee /proc/acpi/bbswitch <<<OFF
I have added the following line into the /etc/modules: bbswitch load_state=0 unload_state=0
What I want is, I want the card to be OFF all the time unless I want it. Can you look into this problem? Thanks.