Open Thaodan opened 8 years ago
What GPU is that?
Nvidia Quadro M2000M (in Dell M7510)
There are users that use Manjoro Linux reporting similar issues [1]
[1] https://forum.manjaro.org/t/bumblebee-not-working-under-4-8-kernel/10676/8
That seems to be the same I described here:
https://devtalk.nvidia.com/default/topic/971733/linux/nvidia-gtx-960m-not-supported-anymore-by-370-28-/?offset=5#4999717
In short: Until nvidia changes their driver, add pcie_port_pm=off
to your kernel commandline (you will then not get everything from the new energy savings introduced by 4.8).
Note that this affects everybody who is part of the following group (see my original linked post):
Anybody using Kernel >=4.8, with a system for which the nvidia card is not the primary output (otherwise it could never enter D3), and a BIOS released >=2015.
The most ugly thing of this story is that the kernel in any case uses pcie_port_pm=off
if your BIOS / UEFI release date is <=2014.
EDIT: This also means it's not really a bumblebee issue, but a problem with kernel 4.8 and nvidia, which just happens to show up on laptops for which nvidia is not the primary card. Apparently I was the first to report this in their forums, but good to know I am not alone with this issue.
As far as I read the first commit revered in the forum post: If the card is advertised as hotplug the pm runtime won't power it to d3 cold and as bumblebee/optimus is somekind of a hotplug mechnism this should apply to the card too or not? Is it possible to set this flag in bbswitch or somewhere else? Or would it be better to place this flag in the gpu driver?
Or would it be better to place this flag in the gpu driver?
Definitely that. That's since this also affects machines even without bumblebee. If you just boot without starting bumblebeed and never loading bbswitch, the nvidia driver will still fail to detect the card - so the driver claiming the device (nvidia) should take care. I guess it is even possible to do that in a better way than claiming to be hotpluggable (which the pcie interface to which the card is connected is not really...): The nvidia driver could check whether the card is in d3cold and power it up if necessary.
So its a "bug" that needs to be fixed in nvidia/nouveau. But the problem is that nvidia is slow. Would it be possible to do one of the solutions either in bumblebee or bbswitch?
But the problem is that nvidia is slow.
AFAIK they do not even claim to support kernel 4.8 yet, so if we're lucky, it could already be in the next driver release.
Would it be possible to do one of the solutions either in bumblebee or bbswitch?
I'm not sure, but the bumblebee maintainers will know ;).
For an immediate "fix", the trick to add pcie_port_pm=off
to kernel commandline is sufficient.
Ah I didn't noticed that you're not a maintainer (:
Should the dirty fix change something for the other devices in the "typical" notebook?
Should the dirty fix change something for the other devices in the "typical" notebook?
It could lead to slightly higher power consumption since unused PCI bridges can not enter the deepest sleep state anymore.
However, since this (pcie_port_pm=off
) was in any case what was done in kernel 4.7 and before, and it's also what the kernel 4.8 still does on any machine with a BIOS released before 2015, I don't expect this is significant on a "typical" machine.
But I'm not sure whether this is also true for the most modern Skylake / Broadwell machines, nowadays things start becoming more and more sensitive and power consumption stays high unless everything
on the busses is in the deepest sleep states.
At least I can promise you things won't be worse over kernel 4.7 ;).
At least I can promise you things won't be worse over kernel 4.7 ;).
Very true, the boot option reverts to the pre-4.8 behavior.
Have you (via udev rules or some other "laptop mode tools") enabled runtime PM? You can check that by reading /sys/bus/pci/devices/0000:01:00:0/power/control
. If it says "auto", then it is enabled. If it is "on", then I would expect it to have the same behavior as adding the boot option.
Btw, some laptops require the new 4.8 method or else may experience memory corruption (see the commit message of https://git.kernel.org/linus/692a17dcc2922a91c6bcf11b3321503a3377b1b1).
Have you (via udev rules or some other "laptop mode tools") enabled runtime PM? You can check that by reading /sys/bus/pci/devices/0000:01:00:0/power/control. If it says "auto", then it is enabled. If it is "on", then I would expect it to have the same behavior as adding the boot option.
Yes, runtime PM is active on that machine: I'm using laptop-mode-tools. In addition, I have enabled PCIe-ASPM for all ports in the UEFI (it's one with unlocked features) and I'm also using "pcie_aspm=force". Only after all this, I could achieve maximum battery runtime almost comparable to Windows on that laptop (sadly ASPM for the NIC and card reader is not working, even in Windows, so the machine saves significantly more if I turn off the full corresponding PCI port).
Btw, some laptops require the new 4.8 method or else may experience memory corruption
Thanks a lot for the link!
This doesn't seem to affect me with kernel 4.8.3 on Solus, but I don't use Bumblebee (I just pass everything to the NV GPU with xrandr):
Linux spinesnap 4.8.3 #1 SMP Thu Oct 20 11:50:13 UTC 2016 x86_64 GNU/Linux
[ 9.786656] nvidia: loading out-of-tree module taints kernel.
[ 9.786660] nvidia: module license 'NVIDIA' taints kernel.
[ 9.793430] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
[ 9.793562] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
[ 9.793574] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 370.28 Thu Sep 1 19:45:04 PDT 2016
[ 9.806149] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 370.28 Thu Sep 1 19:18:48 PDT 2016
[ 9.814270] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 11.790229] nvidia-modeset: Allocated GPU:0 (GPU-4d140bcd-71e6-3bb9-ed1c-033e9cf5bec2) @ PCI:0000:01:00.0
[ 11.790322] nvidia-modeset: Freed GPU:0 (GPU-4d140bcd-71e6-3bb9-ed1c-033e9cf5bec2) @ PCI:0000:01:00.0
I have a GTX 960M in a Acer Aspire V Nitro VN7-792G laptop.
do you use nvidia prime? When yes you wont get the issue as your egpu isn't disabled before using it.
Even if you use nvidia, you might still run into issues if you enable runtime PM for all devices using a udev rule or using "laptop mode tools" before the nvidia driver is loaded (and have kernel 4.8+ without the pcie_port_pm=off
parameter and a new enough laptop).
Using the pm-rework branch which enables using runtime pm is bbswitch worksaround the issue that the nvidia driver doesn't handle runtime pm.
Could someone with an older laptop that doesn't use runtime pm and someone else with an newer laptop if the newer version works/fixes the issue?
That branch is unfinished, last time I was working on it there was still an Oops somewhere. If you have no NVIDIA HDMI audio device, then it might be safe to use though (revert Bumblebee-Project/bbswitch@e0c68599bed6c11e37d5228a3c014b9575bf9edb just to be sure).
Using the pm-rework branch which enables using runtime pm is bbswitch worksaround the issue that the nvidia driver doesn't handle runtime pm.
Could someone with an older laptop that doesn't use runtime pm and someone else with an newer laptop if the newer version works/fixes the issue?
Have you (via udev rules or some other "laptop mode tools") enabled runtime PM? You can check that by reading |/sys/bus/pci/devices/0000:01:00:0/power/control|. If it says "auto", then it is enabled. If it is "on", then I would expect it to have the same behavior as adding the boot option.
Yes, runtime PM is active on that machine: I'm using laptop-mode-tools. In addition, I have enabled PCIe-ASPM for all ports in the UEFI (it's one with unlocked features) and I'm also using "pcie_aspm=force". Only after all this, I could achieve maximum battery runtime almost comparable to Windows on that laptop (sadly ASPM for the NIC and card reader is not working, even in Windows, so the machine saves significantly more if I turn off the full corresponding PCI port).
Fedora 24 just updated to the 4.8.4 kernel. I'm using the bumblebee fedora repo, updated as normal and everything seems to be fine. What exactly is NOT supposed to be working with the 4.8 kernel?
What exactly is NOT supposed to be working with the 4.8 kernel?
To trigger the issue, you have to have:
dmidecode
counts). That's since the kernel only activates PCI port power management in this case. Alternatively, use pcie_port_pm=force
. I'm having the same problem on my laptop with a GTX 970M. After updating the kernel from 4.7.9 to kernel 4.8.4 on Fedora 24, bumblebee's proprietary driver wouldn't install. I didn't think anything of it, as the same thing happened when I upgraded from 4.6 to 4.7, but it was then fixed about a week later with an updated bumblebee-nvidia package.
Today, about a week after Fedora 24 moved to the 4.8 kernel, a new bumblebee-nvidia package was released. I expected it to fix my problem, but the nvidia module still won't install. Running it with the --debug flag, it output this:
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 2203.500272] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13d8)
NVRM: installed in this system is not supported by the 367.57
NVRM: NVIDIA Linux driver release. Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[ 2203.500285] nvidia: probe of 0000:01:00.0 failed with error -1
[ 2203.500319] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[ 2203.500341] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 2203.500341] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 2203.500342] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[ 2203.500477] NVRM: NVIDIA init module failed!
Running it with the --debug flag, it output this:
I expect your issue will vanish if you add (as discussed previously) pcie_port_pm=off
to your kernel commandline which reverts PCIe port power saving behaviour to the pre-4.8 state.
A better fix could be done by nvidia, or a different workaround could be implemented in bumblebee (but since the issue can also be reproduced without bumblebee, I rather think the correct fix should enter the nvidia binary blob).
Ok, I added the kernel parameter to /etc/default/grub and ran "grub2-mkconfig -o /boot/grub2/grub.cfg" rebooted, and tried again, but I'm still getting the same error. Is there a way to verify whether the command line edit took effect?
Edit: Nevermind, I must have messed something up. I manually added the pcie_port_pm=off to the GRUB command line during boot and now it works fine. Thanks for the help.
Yes look at /proc/cmdline
Another work around would be to use a newer bbswitch that supports pm runtime suspend.
Thanks. It's showing that for whatever reason grub2-mkconfig isn't actually editing my commandline, but that it works when I manually add it during boot.
That said, even though that will allow me to load the nvidia module, and running bumblebee-nvidia --check shows that everything is working, I can't actually open anything using optirun or primusrun. When I do I get the following error (with or without pcie_port_pm=off in the commandline):
[seeker@ ~]$ primusrun glxgears
primus: fatal: Bumblebee daemon reported: error: [XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied
[seeker@ ~]$ optirun glxgears
[ 52.986505] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied
[ 52.986543] [ERROR]Aborting because fallback start is disabled.
Regardless of whether I add the pcie bit to the kernel command line, running cat /sys/bus/pci/devices/0000\:01\:00.0/power/control
returns "auto" either way.
Also, when I add the pcie line, and try to primusrun or optirun, the Nvidia GPU will turn on (even though I receive the error and the glxgears won't open) and won't shut back off again. I can see when the Nvidia GPU is on or off from a LED on my laptop. Without the extra pcie line, the Nvidia GPU shuts back off after I receive the error (as it should).
Edit: Thaodan mentioned using a newer bbswitch. Does such a thing exist somewhere that I can try, or was that a theoretical comment that a future version may fix the issue?
dmesg gives the following output:
[ 15.258289] nvidia: loading out-of-tree module taints kernel.
[ 15.258294] nvidia: module license 'NVIDIA' taints kernel.
[ 15.261356] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 15.266352] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
[ 15.266522] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[ 15.266533] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
[ 15.285279] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.57 Mon Oct 3 20:32:57 PDT 2016
[ 15.303408] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 15.547166] bbswitch: version 0.8
[ 15.547171] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[ 15.547176] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[ 15.547274] bbswitch: detected an Optimus _DSM function
[ 15.547283] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[ 15.550748] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 15.578716] nvidia-modeset: Unloading
[ 15.633877] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[ 15.646869] bbswitch: disabling discrete graphics
[ 6704.175466] bbswitch: enabling discrete graphics
[ 6720.390241] bbswitch: enabling discrete graphics
[ 6728.181651] bbswitch: enabling discrete graphics
@seekermoc The power/control
state is not affected by pcie_port_pm=off
. When the latter opiton is given, enabling runtime PM for the port (something like 00:01.0
, not 01:00:0
) will have no observable effect.
There is a bbswitch branch (pr-rework
for example), but these are not suitable for use, it can cause an Oops last time I was working on it. At that time I shifted priority to nouveau because that was easier to fix.
I'm using the branch without the commit mentioned and it works fine without the
Oops.
My device is a dell precision m7510.
I have the gpu enable on exit on in bumblebeed and stop and start it after every suspend.
Bump, same issue here Dell Precision 7510, Quadro M2000M, BIOS 1.8.3 (Oct 2016), Fedora 24, kernel 4.8.6-201.
The driver doesn't even install properly since the kernel module doesn't load.
make[2]: Leaving directory '/usr/src/kernels/4.8.6-201.fc24.x86_64'
make[1]: Leaving directory '/usr/src/kernels/4.8.6-201.fc24.x86_64'
-> done.
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 73.757880] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13b0)
NVRM: installed in this system is not supported by the 367.57
NVRM: NVIDIA Linux driver release. Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[ 73.757885] nvidia: probe of 0000:01:00.0 failed with error -1
[ 73.757897] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 73.757906] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 73.757906] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 73.757907] nvidia-nvlink: Unregistered the Nvlink Core, major device number 238
[ 73.757964] NVRM: NVIDIA init module failed!
[ 180.055104] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=none
[ 180.055158] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13b0)
NVRM: installed in this system is not supported by the 367.57
NVRM: NVIDIA Linux driver release. Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[ 180.055166] nvidia: probe of 0000:01:00.0 failed with error -1
[ 180.055194] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 180.055211] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 180.055212] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 180.055213] nvidia-nvlink: Unregistered the Nvlink Core, major device number 238
[ 180.055325] NVRM: NVIDIA init module failed!
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
Is objtool inclued in your kernel header pkgs? If not it wont build. Try the pm-rekwork branch without the last commit, I have exactly the same system as you and it works with that version.
@Thaodan yep /usr/src/kernels/4.8.6-201.fc24.x86_64/tools/objtool/objtool
@arcivanov Have you tried adding pcie_port_pm=off
to your cmdline?
same problem (asus laptop nvidia 950m): linux kernel 4.8.6 and bumblebee installed
[ 9583.143030] nvidia: module license 'NVIDIA' taints kernel. [ 9583.143031] Disabling lock debugging due to kernel taint [ 9583.600930] nvidia 0000:01:00.0: enabling device (0000 -> 0003) [ 9583.601019] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:139a) NVRM: installed in this system is not supported by the 375.10 NVRM: NVIDIA Linux driver release. Please see 'Appendix NVRM: A - Supported NVIDIA GPU Products' in this release's NVRM: README, available on the Linux driver download page NVRM: at www.nvidia.com. [ 9583.601051] nvidia: probe of 0000:01:00.0 failed with error -1 [ 9583.601077] nvidia-nvlink: Nvlink Core is being initialized, major device number 244 [ 9583.601091] NVRM: The NVIDIA probe routine failed for 1 device(s). [ 9583.601092] NVRM: None of the NVIDIA graphics adapters were initialized! [ 9583.601093] nvidia-nvlink: Unregistered the Nvlink Core, major device number 244 [ 9583.601207] NVRM: NVIDIA init module failed!
Hi @pietrondo @arcivanov and for sure others, just to let you know, nvidia asked me for reproduction steps and an nvidia-bugreport here: https://devtalk.nvidia.com/default/topic/971733/linux/-370-28-with-kernel-4-8-on-gt-2015-machines-driver-claims-card-not-supported-if-nvidia-is-not-primary-card/ it might help if people using different distributions (less individually configured than my Gentoo...) to add their input and an nvidia-bugreport there in case nvidia is still unable to reproduce.
which info do you need?
which info do you need?
I am fine - I just think nvidia (in the board which I linked) could need additional input (logfiles from nvidia-bugreport.sh, which operating systems are effected etc) to help them reproduce the issue.
I put here? (nvidia-bugreport.sh) and to run nvidia-bugreport under my laptop i need to: sudo optirun nvidia-bugreport.sh ?
Pietro Capriata
CONFIDENTIALITY DISCLAIMER Le informazioni contenute in questo messaggio di posta elettronica e negli allegati sono riservate e confidenziali e ne sono vietate la lettura, l'uso, la copia, la comunicazione e la diffusione in qualunque modo eseguite. Qualora lei non fosse la persona destinataria del messaggio, la invitiamo ad eliminarlo, dando gentilmente comunicazione al mittente tramite e-mail di ritorno.
The information contained in this email message and any files transmitted with it are confidential and privileged and any reading, processing, distribution or copy of this material is strictly prohibited, in any form. If you are not the intended recipient of this message, please immediately delete it, giving the relevant communication to the sender by reply e-mail.
On Fri, Nov 11, 2016 at 11:52 PM, olifre notifications@github.com wrote:
which info do you need?
I am fine - I just think nvidia (in the board which I linked) could need additional input (logfiles from nvidia-bugreport.sh, which operating systems are effected etc) to help them reproduce the issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bumblebee-Project/Bumblebee/issues/810#issuecomment-260074162, or mute the thread https://github.com/notifications/unsubscribe-auth/AFzIlkghebeELvISRVmwtskL6o1nuYkmks5q9PGXgaJpZM4KZqTI .
Have you folks tried the pcie_port_pm=off
boot option? If that helps, it would be a great hint already for what went wrong.
I added the pcie_port_pm=off
option and now the module load properly with a Quadro m1000m and Kernel 4.8.6
Yes, but can you actually use primusrun or optirun to start a program? Even after loading the driver module, I still can't actually use it for anything.
@olifre thanks, posted.
pcie_port_pm=off worked for me on Fedora 24 with 4.8.6
Could you also try the pm rework branch without the last commit?
I'm not sure if I did it right, but I downloaded the pm-rework branch without the latest comment, ran 'make' and then 'sudo make load' in the downloaded directory. It doesn't look like it did much. I can successfully turn the dGPU on and off using 'sudo tee /proc/acpi/bbswitch <<<ON|OFF'. I know it works because I can see my dGPU LED indicator switch on and off. However, I still get the same error when trying to start a program with optirun or primusrun.
@seekermoc if you are on fedora you may wish to run
rpm --nodeps -e bbswitch-dkms
and reboot to remove the dkms version of the bbswitch module so you can test your local compiled version.
Nope, doing that doesn't fix the problem. If anything, it's worse, as I can no longer shut the dGPU off at all. bumblebee-nvidia --check confirms that the local version of bbswitch loaded. Still get the same error with primus/optirun.
what is the output from dkms status
and also
lsmod | grep bbswitch
After removing the bbswitch-dkms package and a fresh reboot, my dGPU doesn't turn off during the boot sequence like it should, and dkms status and lsmod have no output.
After make / sudo make load in the local bbswitch directory, 'dkms status' still has no output, but lsmod responds with 'bbswitch 16384 0'
still getting
[seeker@ ~/Desktop/bbswitch-5c7b3f53f229c70bc49c710295967605ac5846e4]$ optirun glxgears
[ 310.088754] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied
[ 310.088782] [ERROR]Aborting because fallback start is disabled.
After updating to linux 4.8 the nvidia driver says your gpu isn't supported when trying to access with primus:
uname:
uname -a Linux hellion 4.8.2-pf #1 SMP PREEMPT Tue Oct 18 10:19:55 CEST 2016 x86_64 GNU/Linux