Bumblebee-Project / bbswitch

Disable discrete graphics (currently nvidia only)
GNU General Public License v2.0
486 stars 77 forks source link

T440s with kernel >= 3.15 doesn't power off properly (Analysis and possible solution included !) #112

Open smunaut opened 9 years ago

smunaut commented 9 years ago

Hi,

So I'm using a T440s with a modern kernel that reports "Windows 2013" compatibility (and soon Windows 2015 with kernel 4.2). This breaks bbswitch because of some changes this triggers in the ACPI table. Manually overriding acpi_osi does fix the issue and allow the current bbswitch to work, but what I'm looking it here is how to make it work with the "new" Win 8.1 method of shutting down the card.

So the main symptoms of the issue are :

This is the DSDT table from the T440s with the latest bios (which even has Windows 10 support) : http://pastebin.com/raw.php?i=C6Q3A8aa

The important thing to note is that when "Windown 2013" string is found, then OSYS is set to 0x07DD. This in turn cause VMSH to be set to 1. This in turn causes SB.PCI0.PEG.VID._PS3 to NOT call GPOF ... and so the card is never really turned off completely.

Now if you look at how GPOF can be called, you can see it will be called as part of NVP3 power resource which is _PR3 ... but on the node SB.PCI0.PEG_ and not SB.PCI0.PEG.VID !

So basically you need to put the PCIe root port (parent pci device) in D3 and not just the card.

I tested this and it indeed triggered the proper expected power saving and seemed to behave exactly like if I tweaked acpi_osi.

doudou commented 8 years ago

Do you have code to put the PCIe root in D3 as well ? Would gladly integrate it in #102.

smunaut commented 8 years ago

This is what I've been using this last week on my T440s :

https://github.com/smunaut/bbswitch/commit/ca87980cb6105c34e1a138b553eac602d8151519

Note that this is non-conditional ... maybe there should be a whitelist of supported machines ? I know on the T440s, the subsystem product id changes depending on which mode (i.e. old mode or new mode) is currently active. (So if you boot with an old kernel you see one subsystem id and if you boot with a new one, you see another).

doudou commented 8 years ago

@smunaut: this change causes memory corruption after I wake up the NVidia card (i.e. back to base one). I am really not a kernel developer, but the obvious problem is that the patch puts the port in D3 but the actual PCIe driver is not informed of that.

I've checked the kernel source code, PCIe ports do not seem to have runtime suspend/resume. That would probably be the better fix (use a dummy nvidia driver to suspend/resume the NV card using the kernel PM handlers and have the PCIe root port go into suspend accordingly). I already wrote the dummy driver, but I'm now trying to find someone I could contact that would be willing to look into this. If you know anybody doing kernel dev ...

The alternative (that I'm going to try as soon as I have the time) would be to unbind the PCIe root port driver from the port and bind yet-another-dummy-driver to put the port to sleep.

smunaut commented 8 years ago

@doudou Mmm ... when I think about it, it must be #78 ... I mean for all intents and purposes, my patch will trigger the same behavior in ACPI as when using the acpi_osi string to revert to the old shutdown behavior.

AFAIK there is no drivers binded to the pcie ports themselves, which is why they don't do auto runtime/suspend. I did raise the issue on linux-pm and linux-acpi but didn't really get any answers.

doudou commented 8 years ago

AFAIK there is no drivers binded to the pcie ports themselves

It seems that there is. lspci reports "pcieport", which matches the driver in drivers/pci/pcie/portdrv_pci.c.

As to whether we can unbind it, that's another story altogether ;-)

doudou commented 8 years ago

As to whether we can unbind it, that's another story altogether ;-)

Yes, it seems to work. In /sys/bus/pci/drivers/pcieport

After

echo 0000:00:01.1 > unbind 

the driver is not listed as driver of the port, and after

echo 0000:00:01.1 > bind 

it gets listed again

smunaut commented 8 years ago

I'd also point out that doing the ACPI DSM call is useless with this new method. It's not required and so maybe removing it would help your issues ? Did you try shutdown when pcieport is unbinded ? Did it help ?

doudou commented 8 years ago

Yes, I already removed the DSM calls.

I'm going to push the module I wrote, which basically registers a PCI driver for the NV card and for the PCIe root port, and manages to make them autosuspend. They both go to D3 on autosuspend, and then boom, the system crashes hard. I did not investigate further, I should really be doing something else than this ;-)

I start to believe that the corruption issues directly stem from the hard shutdown of the NV card.

doudou commented 8 years ago

Some progress.

I've managed to get to a state where both the NVidia and PCIe port get into suspend, using only normal runtime suspend kernel paths. Basically, they get into D3 thanks to the common PCI power-management code. The code is on this branch.

The catch: I get the memory corruption anyways as soon as the card(s) are woken up :( I want to do some reading in intel documentation, whether there are things that need to be done before one is allowed to put a PCIe port in D3.

Note that because I use runtime suspend, you cannot use lspci to check the card's state. This wakes it up. I've used systemtap and powertop to verify that (1) the kernel was attempting to put the cards in D3 and succeeded, and (2) that it led to significant power savings (it did, almost 2W more than with having the NV card in D3).

gsgatlin commented 8 years ago

@doudou Does this code mess up non thinkpad laptops? Like do you plan on doing a pull request?

doudou commented 8 years ago

Does this code mess up non thinkpad laptops?

Don't know ... only have a thinkpad

Like do you plan on doing a pull request?

No, given its effects on my laptop ... I already have a pull request (#102) on a method that kind-of works. Meaning:

I personally use this one on a 4.0.7 kernel. Left the rest on the side, I don't have time for this right now.

gsgatlin commented 8 years ago

@doudou Ok. Cool. Thanks for the info. If you ever have time to get it in a state where you'd like it to be tested on a non thinkpad I have a spare optimus ideapad I could test it on.

sharms commented 8 years ago

I successfully tested the patch on Fedora 23 / 4.2.1-300 on a Thinkpad W541 / Optimus K2100 and power consumption went from 25W to 19W. Thanks!

andrewgdunn commented 8 years ago

Is there a plan to merge this fix into mainline? I'm about to jump to F23.

Lacrymology commented 8 years ago

I'm seeing some possibly related issue with a brand new T450s

andrewgdunn commented 8 years ago

It'd be very nice if we could get this mainlined rather than have it as a required patch.

smunaut commented 8 years ago

@storrgie The issue is that this fix is completely ThinkPad specific ... It would probably break things on other brands.

(and even on thinkpads it's only been tested on a few models)

Lacrymology commented 8 years ago

this is still failing for me, I'm getting this when trying to load bbswitch

[  454.627938] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.VID_
[  454.627945] bbswitch: Found discrete VGA device 0000:04:00.0: \_SB_.PCI0.PEG_.VID_
[  454.627957] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[  454.629069] bbswitch: detected an Optimus _DSM function
[  454.629872] bbswitch: Succesfully loaded. Discrete card 0000:04:00.0 is on
Lacrymology commented 8 years ago

trying to turn off with tee:

# tee /proc/acpi/bbswitch<<<OFF
# dmesg
...
[  531.504356] bbswitch: device 0000:04:00.0 is in use by driver 'bbswitch_nv', refusing OFF
# rmmod bbswitch_nv
# dmesg
[  546.959043] bbswitch_nv: attempting to resume
[  546.959058] bbswitch_nv: attempting to suspend
[  547.076317] bbswitch_nv: attempting to resume
# tee /proc/acpi/bbswitch<<<OFF
# dmesg
[  553.644208] bbswitch: disabling discrete graphics
[  553.644225] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[  553.659670] pci 0000:04:00.0: Refused to change power state, currently in D0
Lacrymology commented 8 years ago

bumblebee works correctly, in the sense of loading the nvidia driver when run with optirun, it's just that the card won't switch off

Lacrymology commented 8 years ago

here's my lspci -v output for the nvidia card (while running glxgears with optirun)

04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)
        Subsystem: Lenovo Device 5037
        Flags: bus master, fast devsel, latency 0, IRQ 49
        Memory at f1000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 3000 [size=128]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia
Lacrymology commented 8 years ago

dmesg when running optirun:

[ 1022.584913] NVLINK: Nvlink Core is being initialized, major device number 244
[ 1022.585100] [drm] Initialized nvidia-drm 0.0.0 20150116 for 0000:04:00.0 on minor 1
[ 1022.585108] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  355.11  Wed Aug 26 16:35:41 PDT 2015
[ 1022.829610] vgaarb: this pci device is not a vga device
[ 1022.833781] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1022.834091] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1022.834244] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1022.834389] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1022.834948] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1022.835382] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1022.835532] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1022.873564] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1023.067067] vgaarb: this pci device is not a vga device

and when stopping it:

[ 1065.933239] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1065.949611] NVLINK: Unregistered the Nvlink Core, major device number 244
[ 1065.951791] [drm] Module unloaded
[ 1065.968420] bbswitch: disabling discrete graphics
[ 1065.968435] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1065.980288] pci 0000:04:00.0: Refused to change power state, currently in D0
smunaut commented 8 years ago

There is nothing that indicates in your logs that it's not working.

The "Refused to change power state" is inconsequential.

The only way to check if it's working or not is to check if the rev is "ff" during a lspci when it's turned off.

Lacrymology commented 8 years ago

@smunaut the rev is not ff, it doesn't change. The first time I tried to turn it off with the vanilla module I got a traceback, as well, I'd forgotten about that:

[  163.280190]  snd_pcm i2c_algo_bit i2c_i801 ptp shpchp snd_timer snd soundcore pps_core lpc_ich processor sch_fq_codel ip_tables x_tables ext4 crc16 mbcache jbd2 sd_mod rtsx_pci_sdmmc mmc_core atkbd libps2 ahci libahci libata scsi_mod xhci_pci ehci_pci ehci_hcd xhci_hcd rtsx_pci usbcore usb_common i8042 serio
[  163.280205] CPU: 0 PID: 2385 Comm: tee Tainted: G        W  O    4.2.1-1-ARCH #1
[  163.280206] Hardware name: LENOVO 20BXCTO1WW/20BXCTO1WW, BIOS JBET51WW (1.16 ) 07/08/2015
[  163.280207]  0000000000000000 00000000619135a4 ffff88030e58fcc8 ffffffff8156b77a
[  163.280209]  0000000000000000 ffff88030e58fd20 ffff88030e58fd08 ffffffff81074846
[  163.280210]  ffff88030e58fce8 ffff88033f6e1000 ffff88033f661c40 00007ffdb03cc1a0
[  163.280212] Call Trace:
[  163.280216]  [<ffffffff8156b77a>] dump_stack+0x4c/0x6e
[  163.280218]  [<ffffffff81074846>] warn_slowpath_common+0x86/0xc0
[  163.280220]  [<ffffffff810748d5>] warn_slowpath_fmt+0x55/0x70
[  163.280223]  [<ffffffff812f503b>] ? __pci_set_master+0x3b/0xf0
[  163.280224]  [<ffffffff812f7b50>] pci_disable_device+0xb0/0xd0
[  163.280227]  [<ffffffffa07714ad>] bbswitch_off+0xad/0x240 [bbswitch]
[  163.280228]  [<ffffffffa077187b>] bbswitch_proc_write+0x9b/0xb2 [bbswitch]
[  163.280231]  [<ffffffff81236742>] proc_reg_write+0x42/0x70
[  163.280233]  [<ffffffff811d01c7>] __vfs_write+0x37/0x100
[  163.280235]  [<ffffffff811d2fd8>] ? __sb_start_write+0x58/0x100
[  163.280237]  [<ffffffff811d0ad4>] vfs_write+0xa4/0x1a0
[  163.280238]  [<ffffffff811d0a14>] ? vfs_read+0x114/0x130
[  163.280240]  [<ffffffff811d17e5>] SyS_write+0x55/0xc0
[  163.280242]  [<ffffffff81570cee>] entry_SYSCALL_64_fastpath+0x12/0x71
[  163.280243] ---[ end trace be475d6cb25e4d1e ]---

Also, according to KDE's battery status panel, I do get an hour less of battery while optirun is running something, but when it's supposed to be turned off, I'm still getting this with lspci

04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)

Notice that as reported above by other people, I'm also getting

root@hex /home/lacrymology/aur/bbswitch # cat /proc/acpi/bbswitch 
0000:04:00.0 ON

regardless

smunaut commented 8 years ago

Are you sure you took the source from my repo and switched to the right branch ? (the default at checkout is 'master' and doesn't have the patch)

Lacrymology commented 8 years ago

yes, branch runtime_suspend, right? I even copied the module over to /lib/modules/... manually and tried to load it with modprobe, it lodaed correctly (after running depmod -a), but still the same results:

# dmesg
[ 1256.134906] bbswitch: version 0.8
[ 1256.134918] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.VID_
[ 1256.134926] bbswitch: Found discrete VGA device 0000:04:00.0: \_SB_.PCI0.PEG_.VID_
[ 1256.134938] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 1256.135548] bbswitch: detected an Optimus _DSM function
[ 1256.135564] bbswitch: Succesfully loaded. Discrete card 0000:04:00.0 is on
[ 1282.007356] bbswitch: disabling discrete graphics
[ 1282.007365] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
# lspci
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)
        Subsystem: Lenovo Device 5037
        Flags: fast devsel, IRQ 16
        Memory at f1000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 3000 [size=128]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel modules: nouveau, nvidia
Lacrymology commented 8 years ago

no, wait, I'm using @doudou 's code. I'll try yours now

Lacrymology commented 8 years ago

@smunaut your code reports rev: ff in lspci, and running with glxgears turnsoptirun turns it back on successfully!

Is there any power consumption metering app that can help me confirm exactly how much power is getting sucked?

smunaut commented 8 years ago

powertop but it only reports it when on battery (also you need to wait a few minutes for it to stabilise)

Lacrymology commented 8 years ago

12.4 W with the card on, 8.6 with the card off. No idea if it's about the right amount (nvidia 940M) but it's definitely something

doudou commented 8 years ago

@smunaut: for what it's worth, I had trouble with your patch a while ago (crashed). I updated my bios a few days ago and just tried it again. It works ! Perfectly ! Thank you so much !

smunaut commented 8 years ago

@doudou Nice :) @Lacrymology Yeah ~3-4W sounds about right.

Lacrymology commented 8 years ago

I know next to nothing about hardware handling, but isn't there a flag or id that can be read to make this if-able? I guess it'd come down to a whitelist if it's to be automatable, but I'm thinking it could be made a module parameter

smunaut commented 8 years ago

You could probab know if this new method should be used by looking at the PCI subdevice IDs. The ACPI table dump of my laptop shows that the subdevices IDs are changed depending on wether the new method is applied or not.

Lacrymology commented 8 years ago

but, again, are they changed in a generic way (i.e. a testable mask), or would we need to build a whitelist of ids which require the patch? How do I get an ACPI table dump of my laptop to compare? Or.. well, are you interested in persuing this? I'm willing to help, I'm a programmer, but I've no experience in hardware/drivers/kernels

chrk123 commented 8 years ago

@smunaut I've compiled your bbswitch hack-t440s module on my t440p. So far, according to lspci, it seems to work.

When /proc/acpi/bbswitch states that the card is off, lspci reports: 02:00.0 VGA compatible controller: NVIDIA Corporation GK208M [GeForce GT 730M] (rev ff)

and when /proc/acpi/bbswitch states that the card is on, lspci reports: 02:00.0 VGA compatible controller: NVIDIA Corporation GK208M [GeForce GT 730M] (rev a1)

However, the battery discharge rate stays the same at about 12W to 13W. Furthermore, the temperature readings from my notebook components didn't change either (55°C). I didn't gave power consumption much thought during the last few months, but normally, I had about <=8W discharge rate in idle and a system temperature below 50°C, which lets me assume that my discrete graphics card is still not properly shut down.

I'm using a T440p , Linux t440p 4.2.3-1-ARCH #1 SMP PREEMPT Sat Oct 3 18:52:50 CEST 2015 x86_64 GNU/Linux on Arch-Linux

lspci -v when turned on:

02:00.0 VGA compatible controller: NVIDIA Corporation GK208M [GeForce GT 730M] (rev a1)         prog-if 00 [VGA controller])
Subsystem: Lenovo Device 221d
Flags: bus master, fast devsel, latency 0, IRQ 17
Memory at f0000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 3000 [size=128]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel modules: nouveau, nvidia

lspci -v when turned off

02:00.0 VGA compatible controller: NVIDIA Corporation GK208M [GeForce GT 730M] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel modules: nouveau, nvidia

dmesg seems fine when using echo OFF/ON > /proc/acpi/bbswitch

[ 2583.030756] bbswitch: disabling discrete graphics
[ 2583.030766] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found 
[Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 2961.430922] bbswitch: enabling discrete graphics

The only thing that is suspicious is the dmesg output after running optirun:

[ 3176.964358] bbswitch: disabling discrete graphics
[ 3176.964369] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch -Found [Buffer], ACPI requires [Package] (20150619/nsarguments-95)
[ 3176.976934] pci 0000:02:00.0: Refused to change power state, currently in D0

which is currently my only hint but you said it's inconsequantial. Does anyone have further hints for me? Thanks!

emanuil-tolev commented 8 years ago

@smunaut have you tried / had any luck with turning off the card completely on your Thinkpad? Unfortunately it seems that the Windows 2013 workaround causes FS corruption after resuming from sleep (even described by a commenter above). I presumed that uninstalling+purging bumblebee and everything nvidia related and rmmod-ing nouveau would be sufficient to turn it off as the kernel just wouldn't turn it on.

doudou commented 8 years ago

Hi @emanuil-tolev: I'm running @smunaut version of the bumblebee module on a Thinkpad t440p successfully. Needed to get the latest BIOS though. Tests done with previous BIOS versions were causing crashes/corruption.

emanuil-tolev commented 8 years ago

Alright, thanks @doudou - I'll try that tomorrow (just about to unhappily run out of a 96Wh battery on a T450s due to this :)).

emanuil-tolev commented 8 years ago

Following the instructions in https://github.com/smunaut/bbswitch/tree/hack-t440s did work for my Thinkpad T450s.

After loading the module (just simple load, I'll install it using dkms later):

emanuil@midori:~/software/bbswitch$ sudo make load
rmmod bbswitch
rmmod: ERROR: Module bbswitch is not currently loaded
make: [load] Error 1 (ignored)
insmod bbswitch.ko

emanuil@midori:~/software/bbswitch$ cat /proc/acpi/bbswitch
0000:04:00.0 ON

emanuil@midori:~/software/bbswitch$ dmesg | tail
[ 5618.042276] bbswitch: module verification failed: signature and/or  required key missing - tainting kernel
[ 5618.042676] bbswitch: version 0.8
[ 5618.042684] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.VID_
[ 5618.042695] bbswitch: Found discrete VGA device 0000:04:00.0 (on 0000:00:1c.4): \_SB_.PCI0.PEG_.VID_
[ 5618.042710] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 5618.043515] bbswitch: detected an Optimus _DSM function
[ 5618.043590] bbswitch: Succesfully loaded. Discrete card 0000:04:00.0 is on

After instructing it to turn the card off:

emanuil@midori:~/software/bbswitch$ sudo tee /proc/acpi/bbswitch <<<OFF
OFF

emanuil@midori:~/software/bbswitch$ lspci -v | grep -i -A 10 nvidia
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev ff) (prog-if ff)
        !!! Unknown header type 7f

emanuil@midori:~/software/bbswitch$ cat /proc/acpi/bbswitch
0000:04:00.0 OFF

emanuil@midori:~/software/bbswitch$ dmesg | tail
[ 5658.848202] bbswitch: disabling discrete graphics
[ 5658.848225] ACPI Warning: \_SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140424/nsarguments-95)
[ 5658.878253] thinkpad_acpi: EC reports that Thermal Table has changed

Power consumption falls by ~2W (powertop).

After turning the card back ON with bbswitch, I get this output:

emanuil@midori:~/software/bbswitch$ sudo tee /proc/acpi/bbswitch <<<ON
ON

emanuil@midori:~/software/bbswitch$ dmesg | tail -2
[10178.009129] bbswitch: enabling discrete graphics
[10178.113447] thinkpad_acpi: EC reports that Thermal Table has changed

emanuil@midori:~/software/bbswitch$ cat /proc/acpi/bbswitch
0000:04:00.0 ON

emanuil@midori:~/software/bbswitch$ lspci -v | grep -i -A 10 nvidia
04:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940M] (rev a2)
        Subsystem: Lenovo Device 5037
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 3000 [size=128]
        Expansion ROM at <ignored> [disabled]
        Capabilities: <access denied>

and power consumption climbs up by about 2W again. I actually expected a bit more than that, but it seems that the Chrome browser and possibly the wifi drivers are both performing significantly worse (in terms of power) than on Windows. But that's out of bbswitch's scope. Sometimes varying system load makes it seem like the power consumption has barely changed by 1W or less - I had to actually stop doing anything, sit down and watch powertop for a few minutes to identify the effect (it is there). It does also make a long-term difference to the battery life, on average it does last 1-2h longer now.

No data corruption (I've slept and restored a few times with the card off) so far.

smunaut commented 8 years ago

Yeah powertop needs a few minutes. that's because it "deduces" the power consumptions by looking at how fast the battery discharges AFAIK and it averages the results over a few minutes.

2W is about what I get too on a T440s. 10-11W -> 8-9W.

hgomersall commented 8 years ago

I'm experiencing a similar problem on a t450s - hack-t440s seems to make bbswitch work properly, though it requires a manual unloading of the drivers first. At risk of polluting the thread, where should the unloading occur - so I can go and ask the right question? (bumblebee seems to load the modules just fine, but never unloads them.)

Lacrymology commented 8 years ago

since the latest update to bbswitch/bumblebee once the nvidia module is loaded to the kernel I can't turn it off anymore, and the device can't be turned off either..

Lacrymology commented 8 years ago

I think I might be seeing the same issue @hgomersall is referring to, although I haven't been able to manually remove the module either

Lacrymology commented 8 years ago

ok, I had to:

modprobe -r nvidia_modeset
modprobe -r nvidia
tee /proc/acpi/bbswitch <<< OFF

for it to work

ArchangeGabriel commented 8 years ago

@hgomersall @Lacrymology That’s https://github.com/Bumblebee-Project/Bumblebee/issues/699.

@smunaut @doudou There is clearly space for improvement, especially regarding declared ACPI_OSI and their impact on ASL functions. We need to make some summary of everything we know about this currently, what we should test to go further and then probably rewrite part of bbswitch/kernel code according to this. Maybe linux-pm or -acpi MLs are better places for this, and people like Rafael Wysocki might be interested in this.

smunaut commented 8 years ago

One thing I've noticed is that the subdevice VID:PID change when the new method is enabled in the ACPI DSDT so maybe it could be based on that to select one method or the other.

ArchangeGabriel commented 8 years ago

Yes, but it might be different for other laptops. Are all the (currently-known) affected laptop following this rule?

Is there any non-Thinkpad laptop concerned? What about Thinkpad laptop? I suppose this need a recent BIOS expecting Windows 2013 OSI?

ArchangeGabriel commented 8 years ago

Also, 3.15 is indeed the culprit because of https://github.com/torvalds/linux/commit/faae404ebdc6bba744919d82e64c16448eb24a36.

smunaut commented 8 years ago

Then a bit later : https://github.com/torvalds/linux/commit/796888e942b34cbbd738d9e5478b7d103ee38061#diff-aa93b5317c200560767b97a9d9301bd8