Bumblebee-Project / bbswitch

Disable discrete graphics (currently nvidia only)
GNU General Public License v2.0
486 stars 78 forks source link

bbswitch is broken with kernel 4.8 pcie port power management #140

Open nathanielwarner opened 8 years ago

nathanielwarner commented 8 years ago

I just upgraded to kernel 4.8, and bbswitch 0.8-1 no longer works properly. When I try to run something with primusrun, it fails with "bumblebee could not enable discrete graphics card" or something, and I get this in dmesg:

bbswitch: enabling discrete graphics
pci 0000:01:00.0: Refused to change power state, currently in D3
pci 0000:01:00.0: Refused to change power state, currently in D3

When I use the kernel command line option pcie_port_pm=off primusrun works again, and I get this in dmesg upon using primusrun:

bbswitch: enabling discrete graphics
nvidia-nvlink: Nvlink Core is being initialized, major device number 242
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  370.28  Thu Sep  1 19:45:04 PDT 2016
vgaarb: this pci device is not a vga device
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  370.28  Thu Sep  1 19:18:48 PDT 2016
nvidia-modeset: Allocated GPU:0 (GPU-33c835cf-d564-600a-037b-c7ecb9188d7c) @ PCI:0000:01:00.0
nvidia-modeset: Freed GPU:0 (GPU-33c835cf-d564-600a-037b-c7ecb9188d7c) @ PCI:0000:01:00.0
vgaarb: this pci device is not a vga device
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
nvidia-modeset: Unloading
nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
bbswitch: disabling discrete graphics
ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
pci 0000:01:00.0: Refused to change power state, currently in D0

Is lack of support for Kernel 4.8 default configuration an issue that anyone else is having? I'm running Manjaro with Kernel 4.8.1-1, Nvidia driver 370.28, bbswitch 0.8-1.

Lekensteyn commented 8 years ago

bbswitch has indeed not been updated for the new PM method in kernel 4.8. If you have a newer machine (>= 2015), you might experience issues if you enabled runtime PM for devices.

Do you happen to have udev rules or other "laptop mode tools" that enable power saving features (i.e. by writing auto to the power/control node in sysfs)? It is my current belief that your problem cannot occur unless you enable such power saving methods,

As a workaround you can boot with the pcie_port_pm=off kernel option (or disable runtime PM for the NVIDIA PCI device or its parent PCIe port).

nathanielwarner commented 8 years ago

I am using TLP and Powertop, but bbswitch still doesn't work with those disabled. The strange thing is that bbswitch seems to think the NVIDIA card is stuck in D0 power state on startup, but then is unable to start it upon invocation of primusrun, and reports that the card is stuck in D3. On startup, I get these messages:

[    8.164115] bbswitch: version 0.8
[    8.164123] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[    8.164132] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[    8.164148] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[    8.164326] bbswitch: detected an Optimus _DSM function
[    8.164338] bbswitch: device 0000:01:00.0 is in use by driver 'nvidia', refusing OFF
[    8.164341] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[    8.164941] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[    8.183647] nvidia-modeset: Unloading
[    8.200285] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[    8.200287] Bluetooth: BNEP filters: protocol multicast
[    8.200293] Bluetooth: BNEP socket layer initialized
[    8.200384] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[    8.221637] bbswitch: disabling discrete graphics
[    8.221655] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[    8.236787] pci 0000:01:00.0: Refused to change power state, currently in D0

And on invocation of primusrun, I get this:

[  225.420138] bbswitch: enabling discrete graphics
[  225.496646] pci 0000:01:00.0: Refused to change power state, currently in D3
[  225.573232] pci 0000:01:00.0: Refused to change power state, currently in D3

Is it possible that the kernel-based port power management is able to control the power state, but bbswitch is not? This would make sense because on pre-4.8 kernel versions there is no kernel-based PCIe port power management, and the card seems to always be stuck in D0 (see dmesg output in my first post), and bbswitch has no problems this way. It is when the card is successfully put into D3 that bbswitch is unable to use it.

Lekensteyn commented 8 years ago

Have you rebooted after making disabling TLP? The PCIe port mgmt introduced with 4.8 cannot be combined with bbswitch in one boot, that case is not supported (this may or may not work, no guarantees).

Have you tried the kernel option which I mentioned above? What is your laptop and GPU btw?

If you see the messages "bbswitch: enabling discrete graphics" followed by "Refused to change power state, currently in D3" (or similarly, "disabling" and "currently in D0"), then it is an indication that something went wrong...

nathanielwarner commented 8 years ago

Yes I rebooted after disabling tlp and powertop. As I said I above, when I boot with the kernel option you mention, primusrun works again, but bbswitch reports that it cannot change the gpu power state out of D0 when primusrun stops running. But that happened with earlier kernels as well. And yes, something is clearly going wrong. But maybe this has actually been a problem all along, but is only now showing itself because the kernel is putting the dGPU out of D0. My laptop is Dell XPS 15 9550, gpu is Intel HD 530 + GeForce GTX 960M.

rockorequin commented 8 years ago

Odd, I have exactly the same laptop and I'm using tlp and powertop, but I don't get this problem until after a suspend/resume cycle. Or is this issue fixed in bbswitch 0.8.4ubuntu1?

nathanielwarner commented 8 years ago

You're sure you have the exact same model, and that you're running Kernel 4.8? It's possible you have a different BIOS than me (If you don't know, Dell has been rapidly pushing out BIOS updates to try to fix an alarming number of issues. Many of the updates have made things worse, so I'm currently on an older BIOS.) It's also possible that the issue is fixed in bbswitch 0.8.4ubuntu1. Maybe I'll try Ubuntu and see if that works better.

rockorequin commented 8 years ago

Yes, it's the same model, with the same GPUs. It has the 4K screen and I'm running the 1.2.0 BIOS (I tried 1.2.14, but it has a screen flickering problem which makes it unusable.) It's possible I disabled tlp and powertop power management and forgot, of course.

nathanielwarner commented 8 years ago

I'm in the exact same situation as you, on 1.2.0 (Seriously, Dell needs to get their act together!) Since disabling tlp and powertop didn't solve it for me, my only guesses are that the version of bbswitch you have is newer than mine, or that you are running a different kernel version (pre 4.8).

rockorequin commented 8 years ago

I'm running the mainline 4.8 kernel also (with the patch from https://bugs.freedesktop.org/show_bug.cgi?id=97596 to avoid a weird flickering artefact that occurs on Skylake architecture with 4.8 if you have a second monitor attached).

nathanielwarner commented 8 years ago

Ok, I'll probably try Ubuntu with the mainline kernel at some point to see if that fixes the issue. Until then, should this issue be closed?

Lekensteyn commented 7 years ago

@rockorequin Perhaps you are using nouveau instead of bbswitch? Personally I am back to nouveau since my new laptop requires it for an external monitor.

nathanielwarner commented 7 years ago

I actually did try it with Ubuntu 16.10 with Kernel 4.8, and it is fixed. There must be something internal to Manjaro that is screwing it up.

ArchangeGabriel commented 7 years ago

I’m reopening this, because even if it seems to work (i.e. it reports OFF), the power consumption and temperature correspond to the case of a ON card on my setup. Adding pcie_port_pm=off to the kernel parameters solves it.

When using nouveau, temperature and power consumption also correspond to a ON card.

Lekensteyn commented 7 years ago

The result of combining the DSM method (as used by bbswitch) with the new power resources method (as used since Linux 4.8 and nouveau) in a single boot is not known (I would call it undefined behavior). Forcing pcie_port_pm=off basically reverts to the DSM method.

How do you observe that the video card is off with nouveau? You have to check your dmesg for the last messages related to nouveau. If you see "DRM: resuming kernel object tree" with no "suspending console" as follow up, then you know something is keeping the device busy.

ArchangeGabriel commented 7 years ago

I do get those lines at the end:

kernel: nouveau 0000:01:00.0: DRM: suspending console...
kernel: nouveau 0000:01:00.0: DRM: suspending display...
kernel: nouveau 0000:01:00.0: DRM: evicting buffers...
kernel: nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
kernel: nouveau 0000:01:00.0: DRM: suspending client object trees...
kernel: nouveau 0000:01:00.0: DRM: suspending kernel object tree...

That being said, I probably need to do some more investigations (power consumption with bbswitch vs nouveau vs nothing, and all those with or without pcie_port_pm=off) to properly determine what seems to work and what not, and then start reporting bug against kernel/nouveau.

ArchangeGabriel commented 7 years ago

Also, @Lekensteyn, grabbed this at some point, if I remember correctly it was while running a boot without pcie_port_pm=off on my newer machine and trying to echo OFF to bbswitch after seeing temperature increase:

[13827.423220] bbswitch: disabling discrete graphics
[13827.423230] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[13827.424013] ------------[ cut here ]------------
[13827.424017] WARNING: CPU: 3 PID: 2343 at drivers/pci/pci.c:1616 pci_disable_device+0xa8/0xd0
[13827.424018] pci 0000:01:00.0: disabling already-disabled device
[13827.424019] Modules linked in:
[13827.424019]  bbswitch(O) mousedev snd_hda_codec_conexant snd_hda_codec_generic hid_generic arc4 msr iTCO_wdt i2c_designware_platform hp_wmi iTCO_vendor_support i2c_designware_core mxm_wmi joydev sparse_keymap nls_iso8859_1 $
[13827.424049]  evdev hp_wireless ac mac_hid tpm_tis acpi_pad tpm_tis_core tpm sch_fq_codel ip_tables x_tables btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_p$
[13827.424077] CPU: 3 PID: 2343 Comm: tee Tainted: G        W  O    4.8.2-1-ARCH #1
[13827.424078] Hardware name: HP HP ZBook Studio G3/80D4, BIOS N82 Ver. 01.07 04/27/2016
[13827.424079]  0000000000000286 00000000be2b784c ffff8808181bbcf8 ffffffff812fe280
[13827.424081]  ffff8808181bbd48 0000000000000000 ffff8808181bbd38 ffffffff8107c85b
[13827.424083]  0000065000000000 ffff88089b30c000 ffff88089b2fefa0 00007ffd831c7e40
[13827.424086] Call Trace:
[13827.424090]  [<ffffffff812fe280>] dump_stack+0x63/0x83
[13827.424092]  [<ffffffff8107c85b>] __warn+0xcb/0xf0
[13827.424093]  [<ffffffff8107c8df>] warn_slowpath_fmt+0x5f/0x80
[13827.424094]  [<ffffffff8134bf6b>] ? __pci_set_master+0x3b/0xf0
[13827.424096]  [<ffffffff8134ee98>] pci_disable_device+0xa8/0xd0
[13827.424098]  [<ffffffffa06a548d>] bbswitch_off+0xad/0x240 [bbswitch]
[13827.424100]  [<ffffffffa06a5870>] bbswitch_proc_write+0xb0/0xc7 [bbswitch]
[13827.424102]  [<ffffffff81276f82>] proc_reg_write+0x42/0x70
[13827.424104]  [<ffffffff812087b7>] __vfs_write+0x37/0x140
[13827.424107]  [<ffffffff810c7b87>] ? percpu_down_read+0x17/0x50
[13827.424108]  [<ffffffff81209586>] vfs_write+0xb6/0x1a0
[13827.424109]  [<ffffffff8120aa05>] SyS_write+0x55/0xc0
[13827.424111]  [<ffffffff815f7cf2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[13827.424112] ---[ end trace 4f6318674a3d9756 ]---

Will try to reproduce, but I think this is likely caused by bbswitch/pcie_port_pm interaction on 4.8 for newer systems.

Lekensteyn commented 7 years ago

For your last issue, if you have some udev rule enabling runtime PM for devices (e.g. "laptop mode tools") then indeed it will upset bbswitch on the new behavior (4.8 without pcie_port_pm=off on newer laptops).

nathanielwarner commented 7 years ago

I should point out that if you're using nouveau (rather than nvidia), you actually don't need bumblebee or bbswitch- you can just use DRI_PRIME=1 before the app you want to run with the discrete gpu. See https://wiki.archlinux.org/index.php/PRIME

ArchangeGabriel commented 7 years ago

@nathanielwarner If you’re telling that to me, I assure you that I know. ;) But that’s not really related to the current issue.

@Lekensteyn OK, I’ve got tlp installed (and running) on the same system, I’ll also try with or without it to see what it gives. So that’s one more factor to try. Should have time tomorrow to look at all that. :)

On a side note, do you still intend to update bbswitch for supporting this new method any time soon? Maybe we should release Bumblebee 4.0 without waiting much further and add a release note about bbswitch state (interaction with 4.8 pcie_port_pm, open/known issues). ;)

Lekensteyn commented 7 years ago

intend to update bbswitch for supporting this new method yes

any time soon? no (time constraints). nouveau seems to work so I have not really propritized it here.

I was hoping to get this fixed before Bumblebee 4, but it seems things are really stalling, so maybe it is better to release it since it at least improves the nvidia driver situation. Release note with known issues should be ok :)

bluca commented 7 years ago

+1 for a new release, debian 9 deadlines are approaching fast :-)

ArchangeGabriel commented 7 years ago

OK, I’ll go through all open issues soon (help appreciated) and will try to release by the end of the week. Stay tuned. If there is any need for discussion, https://github.com/Bumblebee-Project/Bumblebee/issues/319 is the place to go now. ;)

GreatBigWhiteWorld commented 7 years ago

I see that in your bumblebee 4 issue, you are delaying its release for another few weeks. So I need to get this to work even temporarily.

If I understand right, I need to add "pcie_port_pm=off" in the grub configuration as kernel parameter, and the drawback is that I am constantly running on nvidia card right?

Thanks in advance.

Lekensteyn commented 7 years ago

@GreatBigWhiteWorld pcie_port_pm=off is a workaround that allows you to use bbswitch with kernel 4.8 and newer. If you use older kernels, you do not need that option.

If you use nouveau (and not bbswitch nor the nvidia proprietary driver), then you do not have to do anything.

GreatBigWhiteWorld commented 7 years ago

Thanks. Yes I am running 4.8 kernel and bbswitch is always off at the moment. I guess I need this option.

ademcal commented 7 years ago

I have same like that problem and the problem solved with this parameter pcie_port_pm=off Beside Laptop is Dell 7559 and OpenSUSE-Thumbleweed

Dmesg output is down

[    7.982013] ------------[ cut here ]------------
[    7.982017] WARNING: CPU: 6 PID: 1550 at ../drivers/pci/pci.c:1616 pci_disable_device+0xa1/0xd0
[    7.982018] pci 0000:02:00.0: disabling already-disabled device
[    7.982019] Modules linked in:
[    7.982019]  af_packet nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit bnep nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_generic videodev usbhid btusb btrtl snd_hda_codec_hdmi dell_led arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec hid_multitouch kvm_intel snd_hda_core kvm snd_hwdep irqbypass iwlmvm snd_pcm iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul mac80211 snd_seq crc32c_intel ghash_clmulni_intel i2c_designware_platform snd_seq_device snd_timer i2c_designware_core aesni_intel idma64 virt_dma iwlwifi dell_wmi aes_x86_64 sparse_keymap lrw dell_smbios glue_helper dcdbas dell_smm_hwmon ablk_helper cryptd rtsx_pci_ms hci_uart
[    7.982039]  snd ip6t_REJECT nf_reject_ipv6 btbcm memstick pcspkr i2c_i801 mei_me cfg80211 i2c_smbus mei intel_lpss_pci int3403_thermal btqca xt_tcpudp soundcore joydev btintel nf_conntrack_ipv6 battery pinctrl_sunrisepoint bluetooth nf_defrag_ipv6 ac pinctrl_intel intel_lpss_acpi intel_lpss fan processor_thermal_device int3402_thermal int340x_thermal_zone dell_rbtn shpchp int3400_thermal intel_soc_dts_iosf acpi_als acpi_thermal_rel kfifo_buf tpm_tis fjes thermal tpm_tis_core industrialio rfkill ip6table_raw acpi_pad tpm ipt_REJECT nf_reject_ipv4 iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables rtsx_pci_sdmmc mmc_core mxm_wmi i915 serio_raw xhci_pci
[    7.982058]  rtsx_pci mfd_core xhci_hcd i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt usbcore fb_sys_fops usb_common drm wmi video i2c_hid button coretemp msr sg bbswitch(O) efivarfs [last unloaded: nvidia]
[    7.982067] CPU: 6 PID: 1550 Comm: bumblebeed Tainted: P     U     O    4.8.6-2-default #1
[    7.982067] Hardware name: Dell Inc. Inspiron 7559/0H0CC0, BIOS 1.2.0 09/22/2016
[    7.982068]  0000000000000000 ffffffffb03a4272 ffff99f4daab3da8 0000000000000000
[    7.982070]  ffffffffb007de2e ffff99f501704000 ffff99f4daab3df8 ffff99f4daab3f28
[    7.982072]  00000000017d7270 0000000000000000 0000000000000028 ffffffffb007de9f
[    7.982074] Call Trace:
[    7.982082]  [<ffffffffb002eefe>] dump_trace+0x5e/0x310
[    7.982085]  [<ffffffffb002f2cb>] show_stack_log_lvl+0x11b/0x1a0
[    7.982087]  [<ffffffffb0030001>] show_stack+0x21/0x40
[    7.982090]  [<ffffffffb03a4272>] dump_stack+0x5c/0x7a
[    7.982093]  [<ffffffffb007de2e>] __warn+0xbe/0xe0
[    7.982096]  [<ffffffffb007de9f>] warn_slowpath_fmt+0x4f/0x60
[    7.982098]  [<ffffffffb03eb551>] pci_disable_device+0xa1/0xd0
[    7.982101]  [<ffffffffc036e409>] bbswitch_off+0x89/0x230 [bbswitch]
[    7.982104]  [<ffffffffc036e7c3>] bbswitch_proc_write+0x93/0xaa [bbswitch]
[    7.982108]  [<ffffffffb02854dd>] proc_reg_write+0x3d/0x60
[    7.982111]  [<ffffffffb02187c3>] __vfs_write+0x23/0x140
[    7.982114]  [<ffffffffb0219080>] vfs_write+0xb0/0x190
[    7.982115]  [<ffffffffb021a302>] SyS_write+0x42/0x90
[    7.982118]  [<ffffffffb06d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[    7.983563] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xa8

[    7.983564] Leftover inexact backtrace:

[    7.983566] ---[ end trace 8e83878053cc2799 ]---
ssbb commented 7 years ago

I have Dell XPS 15 9550 with 960M too. cat /proc/acpii/bbswitch tell me that GPU if off but my laptop is noisy all the time. I think it happens only with 4.8 kernel since this had not been before.

I am added pcie_port_pm=off as kernel paramter but looks like it does not help:

[  193.771954] bbswitch: enabling discrete graphics
[  199.161884] bbswitch: disabling discrete graphics
[  199.161893] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  262.993580] bbswitch: enabling discrete graphics
[  263.317141] nvidia: module license 'NVIDIA' taints kernel.
[  263.317143] Disabling lock debugging due to kernel taint
[  263.324303] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[  263.324323] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.10  Fri Oct 14 10:30:06 PDT 2016 (using threaded interrupts)
[  263.899187] vgaarb: this pci device is not a vga device
[  263.907317] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907458] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907543] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907620] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907696] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907811] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.907888] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  263.937265] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  264.134610] vgaarb: this pci device is not a vga device
[  264.417771] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.10  Fri Oct 14 10:05:55 PDT 2016
[  267.564436] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  267.570652] nvidia-modeset: Unloading
[  267.583848] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[  267.611886] bbswitch: disabling discrete graphics
[  267.611895] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[  267.627364] pci 0000:01:00.0: Refused to change power state, currently in D0
Lekensteyn commented 7 years ago

@ssbb Are you sure that your fan issue is new with 4.8? Do you actually need the nvidia GPU? If not, remove pcie_port_pm=off and use nouveau instead.

ssbb commented 7 years ago

@Lekensteyn really not sure. Just reinstalled the system and this happens. I did not have this issue with old system on 4.7 kernel.

I am using nvidia for gaming and thats why I am on bbswitch with bumblebee :)

UPD: about fans - left bottom corner of my laptop is pretty hot. nvidia chip is located here and that's why I am found this issue at all.

ademcal commented 7 years ago

I relaized faster fan problem. I never fan problem with kernel 4.7 I am tkinking same like @ssbb

DistantThunder commented 7 years ago

Arch Linux (ZEN kernel) Kernel: Linux 4.9.6-1-zen x86_64 GNU/Linux bbswitch: 0.8-61 bumblebee: 3.2.12 primus: 20151110-6 NVIDIA 375.26 for GTX 960m dGPU on MSI PX60 6QE laptop (Core i7-6700HQ).

Using the "bbswitch-dkms" package, I confirm bbswitch is working again, seemingly thanks to changes made in Kernel 4.9.x mainline.

My laptop power led has a built-in indicator allowing me to see easily if the NVIDIA dGPU is powered on. Up until now with bbswitch disabled on kernel 4.8.x, it was always powered on post-boot. Now, I can see it being powered off right after the boot session and staying so until I run something with optirun.

As soon as the dGPU-ran program exits, the led changes state and I can see the dGPU has been powered off.

Kudos to the kernel guys and thanks to the bumblebee project!

tomdee commented 7 years ago

I'm still seeing it not working on Arch - I have the same versions of all the deps you list above. I'm running on a Precision 5510 (Quadro m1000m).

optirun glxspheres64
[  662.547297] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[  662.547387] [ERROR]Aborting because fallback start is disabled.

and

[  662.571882] bbswitch: enabling discrete graphics
[  662.571926] pci 0000:01:00.0: Refused to change power state, currently in D3
atsuya commented 7 years ago

The same here on Mi Notebook Air 13.3.

DistantThunder commented 7 years ago

Maybe it's some ACPI problem, my kernel command line looks like this:

pci=nomsi modprobe.blacklist=nouveau intel_iommu=on acpi_osi=! acpi_osi="Windows 2009"

The value for acpi_osi parameters was found through testing. I've seen values that worked for other people but didn't for my machine, so maybe you'll have to test on that.

chadfurman commented 7 years ago

GRUB_CMDLINE_LINUX_DEFAULT="resume=/dev/nvme0n1p6 splash=silent quiet showopts pcie_port_pm=off"

I'm using the recommended flag here, I also do have powertop on but I've toggled the Runtime PM for NVIDIA Quadro M2000M to off.

When I do sudo tee /proc/acpi/bbswitch <<<ON and then cat /proc/acpi/bbswitch I see it is still off.

This is what I see in dmesg: [ 303.819713] pci 0000:01:00.0: Refused to change power state, currently in D3

My uname string: Linux linux-tb94 4.10.1-1-default #1 SMP PREEMPT Sun Feb 26 12:43:10 UTC 2017 (1ecd5af) x86_64 x86_64 x86_64 GNU/Linux

Version: bumblebeed (Bumblebee) 3.2.1

I'm sure you've guessed, but this is the output of bumblebeed --debug:

lotus@linux-tb94:~> sudo bumblebeed --debug
[  380.716976] [DEBUG]bbswitch has been detected.
[  380.716989] [INFO]Switching method 'bbswitch' is available and will be used.
[  380.716991] [DEBUG]Active configuration:
[  380.716993] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf
[  380.716995] [DEBUG] X display: :8
[  380.716996] [DEBUG] LD_LIBRARY_PATH: 
[  380.716998] [DEBUG] Socket path: /var/run/bumblebee.socket
[  380.716999] [DEBUG] xorg.conf file: /etc/bumblebee/xorg.conf.nouveau
[  380.717001] [DEBUG] xorg.conf.d dir: /etc/bumblebee/xorg.conf.d
[  380.717002] [DEBUG] ModulePath: 
[  380.717003] [DEBUG] GID name: bumblebee
[  380.717005] [DEBUG] Power method: auto
[  380.717006] [DEBUG] Stop X on exit: 1
[  380.717008] [DEBUG] Driver: nouveau
[  380.717010] [DEBUG] Driver module: nouveau
[  380.717011] [DEBUG] Card shutdown state: 1
[  380.717080] [DEBUG]Process /sbin/modprobe started, PID 2864.
[  380.717128] [DEBUG]Hiding stderr for execution of /sbin/modprobe
[  380.717877] [DEBUG]SIGCHILD received, but wait failed with No child processes
[  380.717881] [DEBUG]Configuration test passed.
[  380.718049] [INFO]bumblebeed 3.2.1 started
[  380.718081] [INFO]Initialization completed - now handling client requests
[  390.194784] [DEBUG]Accepted new connection
[  390.195181] [INFO]Switching dedicated card ON [bbswitch]
[  390.195303] [ERROR]Could not enable discrete graphics card
[  390.195487] [DEBUG]Socket closed.

This happens when I run optirun glxspheres

If this isn't related, I'll open a new ticket. Seems related, though.

Lekensteyn commented 7 years ago

@chadfurman A new ticket would probably be more appropriate (be sure to mention your model name and include the info for https://bugs.launchpad.net/lpbugreporter/+bug/752542). Note that the old method (forced by pcie_port_pm=off) might not work for some newer devices.

zx2c4 commented 7 years ago

@Lekensteyn any plans to support the new power management scheme used by newer kernels? bbswitch currently suffers from code rot and is all but useless until it can actually work with the current kernels.

Lekensteyn commented 7 years ago

@zx2c4 Eventually I wanted to finish the support, but even with support for the new scheme out-of-the-box support is broken for many laptops (needing an acpi_osi trick). I have seen with many friends that they need this workaround or otherwise their laptops would freeze at startup (when Xorg starts), shutdown, suspend, etc. It would really be nice if that somehow gets sorted, then there is more motivation to update bbswitch I guess. (These are independent issues though, but it shows how terribly broken things currently are :/)

artscoop commented 7 years ago

As @Lekensteyn says, I also need this linux command workaround or else the system freezes as soon as XOrg starts (with fans going full speed 15 seconds after the freeze). If I use the acpi_osi thing, XOrg and bbswitch work. However I will not be able to suspend the machine; I would need to add some other tricks to make everything work, which are one or more of pci=nomsi modprobe.blacklist=nouveau intel_iommu=on.

The difficulty with this is that it is not much documented except on Github issues.

real-or-random commented 7 years ago

I have the same issue on a ThinkPad T570. pcie_port_pm=off solves it.

> lspci  | grep "VGA\|3D"
00:02.0 VGA compatible controller: Intel Corporation Device 5916 (rev 02)
02:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev ff)
> uname -sr
Linux 4.11.6-3-ARCH
lrafa commented 7 years ago

Booting with a kernel with pcie_port_pm=off does not solve it here.

pci 0000:01:00.0: Refused to change power state, currently in D0 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2) Linux sleipnir 4.9.16-gentoo #4 SMP Sun Jul 16 15:45:10 CEST 2017 x86_64 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz GenuineIntel GNU/Linux

Also tried kernel 4.11.8, same result. I've tried the git version of bbswitch and bumblebee, as well as bbswitch-0.8. Neither works.

Lekensteyn commented 7 years ago

@lrafa define "not solved". Do you experience a lockup or just higher power consumption?

txutxifel commented 7 years ago

The same here, when i upgraded a Asus K551L to Opensuse Tumbleweed a month ago, (bbswitch-08, kernel 4.11): I got this from dmesg: Pci 0000:04:00.0: Refused to change power state, currently in D0.

I remember I tried pcie_port_pm=off and I got this: -Optirun sleep 5 -> Nothing happens -primusrun sleep 5 -> Nothing happens -sudo optirun sleep 5 -> It works -sudo primusrun sleep 5 -> Nothing happens

I dont have more information, so I installed Opensuse 42.2 again. I hope this help.

Cheers

lrafa commented 7 years ago

@Lekensteyn After calling primusrun glxgears, the fan turns on and remains so. Besides, the error "Refused to change power state, currently in D0" persists. In addition to that, bumblebee.service can only be started after I start X, otherwise I get a freeze. Once the fan starts, the only way to turn it off is to shutdown the computer - rebooting won't do either.

I'd have the flexibility to enable/disable kernel features and modules at will, but at this point I'm not sure which ones to mess with. Does this work with nouveau? Are there any drawbacks with nouveau (performance-wise, that is).

Lekensteyn commented 7 years ago

@irafa Oh right, the current stable bbswitch is broken now without pcie_port_pm=off for newer laptops and there is no ETA for a complete fix. (There is an experimental version available in the pm-rework branch, but skip the last switcheroo commit, so use 5c7b3f53f229c70bc49c710295967605ac5846e4).

If you don't need CUDA or powerful GPU acceleration, but need something to save power and/or use external displays, then give nouveau a try. nouveau (without pcie_port_pm=off) should solve the fan issues and integrate better. It allows you to use external monitors (which Bumblebee does not provide out-of-the-box) and saves power when unused.

lrafa commented 7 years ago

@Lekensteyn I am using pcie_port_pm=off in my kernel. However, I'm not booting from GRUB/LILO, I'm using directly the EFI stub (no rEFInd either), so the kernel is already compiled with that option.

I'll give nouveau a try.

lrafa commented 7 years ago

@Lekensteyn sorry but pcie_port_pm=off does not solve the problem, and nouveau driver freezes my X session after few seconds.

Which version of nvidia-drivers did you guys use to get it working with pci_port_pm=off?

Also, I am experiencing a particularly weird behaviour. After starting bumblebee service (with systemd) it throws no error and I can run exactly one application with primusrun without any problem too. I can run the said application (for instance, 3D games) for hours and the fan doesn't even start.

As soon as I close this application, dmesg reports the following error: [ 202.661191] pci 0000:01:00.0: Refused to change power state, currently in D0 and just then the fan starts to go crazy.

waltervargas commented 7 years ago

Hey @Lekensteyn, thank you for all your hard work on this, just to inform that I have the same error that everyone has in this thread under Debian Buster, the details of my configuration are:

❯ uname -a 
Linux precision5510 4.11.0-1-amd64 #1 SMP Debian 4.11.6-1 (2017-06-19) x86_64 GNU/Linux
precision5510# echo ON > /proc/acpi/bbswitch   
precision5510# cat /proc/acpi/bbswitch         
0000:01:00.0 OFF
Aug 10 00:16:41 precision5510 kernel: [17955.391301] bbswitch: enabling discrete graphics     
Aug 10 00:16:41 precision5510 kernel: [17955.391345] pci 0000:01:00.0: Refused to change power state, currently in D3
❯ sudo tlp-stat -s
--- TLP 0.9 --------------------------------------------

+++ System Info
System         = Dell Inc. Precision 5510
BIOS           = 1.2.13
Release        = Debian GNU/Linux testing (buster)
Kernel         = 4.11.0-1-amd64 #1 SMP Debian 4.11.6-1 (2017-06-19) x86_64
/proc/cmdline  = BOOT_IMAGE=/vmlinuz-4.11.0-1-amd64 root=/dev/mapper/precision5510--vg-root ro quiet
Init system    = systemd v234
Boot mode      = UEFI

+++ TLP Status
State          = enabled
Last run       = 12:15:52 AM,    137 sec(s) ago
Mode           = battery
Power source   = battery
oschwand commented 7 years ago

This probably an ugly workaround, but this the only solution I found to manage to use my nvidia GPU after a sleep-wakeup (I tried the various tricks with pcie_port_pm=off, rcutree.rcu_idle_gp_delay=1 and misc values for acpi_osi):

echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove
echo 1 >/sys/bus/pci/rescan

The first line completely removes the PCI device (you may need to adjust the device number) and the second one trigger a scan of the bus which add the device. For now, I launch it manually but I will probably addinto a pm-suspend hook at some point.

lrafa commented 7 years ago

I'm growing increasingly curious, as to what precisely is so wrong about my configuration that I don't observe any of the behaviour that you guys are mentioning!

oschwand's suggestion does nothing here. The nVidia card disappears from lspci, but the fan is still on and once the temperature reaches its threshold, the computer crashes and I must hard-reboot.