Bumblebee-Project / Bumblebee

Bumblebee daemon and client rewritten in C
http://www.bumblebee-project.org/
GNU General Public License v3.0
1.29k stars 142 forks source link

NVIDIA GPU is not supported since 4.8 #810

Open Thaodan opened 8 years ago

Thaodan commented 8 years ago

After updating to linux 4.8 the nvidia driver says your gpu isn't supported when trying to access with primus:

[Okt18 12:54] bbswitch: enabling discrete graphics
[  +0,926684] nvidia: module license 'NVIDIA' taints kernel.
[  +0,000001] Disabling lock debugging due to kernel taint
[  +0,529426] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
[  +0,000037] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13b0)
              NVRM: installed in this system is not supported by the 370.28
              NVRM: NVIDIA Linux driver release.  Please see 'Appendix
              NVRM: A - Supported NVIDIA GPU Products' in this release's
              NVRM: README, available on the Linux driver download page
              NVRM: at www.nvidia.com.
[  +0,000014] nvidia: probe of 0000:01:00.0 failed with error -1
[  +0,000053] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[  +0,000033] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  +0,000002] NVRM: None of the NVIDIA graphics adapters were initialized!
[  +0,000002] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241

uname: uname -a Linux hellion 4.8.2-pf #1 SMP PREEMPT Tue Oct 18 10:19:55 CEST 2016 x86_64 GNU/Linux

gsgatlin commented 8 years ago

Ok. Yeah. So you know your custum version is getting loaded. I guess you can try

systemctl restart bumblebeed.service

to see if it made anything better.

seekermoc commented 8 years ago

That at least turns the dGPU back off (and enables me to manually switch it on and off again), but I still can't use primus/optirun.

edoantonioco commented 8 years ago

Latest kernel (4.8.7-1 on Manjaro) fixed this issue on my case. Edit: it did not, now Im using the workaround provided previously

olifre commented 8 years ago

Latest kernel (4.8.7-1 on Manjaro) fixed this issue on my case.

I can confirm that for me, the problem persists with 4.8.7 on Gentoo Linux. There's also nothing in the (vanilla) kernel logs about any power management related changes, so I don't see how a kernel update could change the issue (the only "fix" would be if the newer kernel reverted the change, or for some reason power management got broken in the update). pcie_port_pm=off again works around the issue.

kmare commented 7 years ago

Nvidia just released a new set of drives (375.20). Can anyone report back if it solved the issue? http://www.nvidia.com/download/driverResults.aspx/111596/en-us

Thaodan commented 7 years ago

I think no cause the issue is part how Optimus is handled. For the bumblebee way using the gpu bbswitch needs to extended (see pm rewrite for a good start).

olifre commented 7 years ago

For the bumblebee way using the gpu bbswitch needs to extended (see pm rewrite for a good start).

Agreed - but I still think nvidia needs to fix something, too, since even if bumblebee is disabled and bbswitch is blacklisted, if the nvidia card is not actively used on boot and thus the kernel disables the PCI-port, the nvidia driver itself will not reactivate the port if modprobing it (as I have described in the nvidia forums). Depending on the timing during boot (activating nvidia-persistenced of course might help...), I believe this also prevents prime etc.

I haven't tested 375.20 yet, though.

putterson commented 7 years ago

I have just tested with nvidia 375.20 and the issue is certainly fixed for me. I am running arch with kernel 4.8.8-2-ARCH on a Dell XPS 9550. I have bbswitch loaded and I am not passing 'pcie_port_pm=off' to my kernel (which I used before to work around this issue.)

I'm not sure if the fix is with kernel 4.8.8 or with the nvidia drivers but if anybody would like me to test anything I'd be happy to oblige.

seekermoc commented 7 years ago

I can confirm that this issue is fixed with me as well with the 375.20 drivers. I switched from the "managed" to "unmanaged" Fedora repo and downloaded the 375.20 drivers. They installed perfectly on kernel 4.8.8 with the default bbswitch version from the repo (not the pm_rework version). Everything works normally now, including optirun and primusrun.

Edit: I was mistaken, I did still have pcie_port_pm=off in my cmdline. I tried removing it, and primus/optirun stopped working, so you do still need the workaround, but at least bumblebee works again.

seekermoc commented 7 years ago

For fun, I tried using the pm_rework version of bbswitch to see if it would work without the pcie_port_pm=off workaround, but it did not work.

For me, bottom line is that drivers 375.20, default repo bbswitch, and pcie_port_pm=off now works in full.

Due to this, I think this may have been two separate problems that occurred concurrently. First, kernel 4.8 requires the pcie pm workaround. Second, for primus/optirun to work it requires the 375.20 drivers (possibly because 375.20 adds support for xorg 1.19, and Fedora updated us to xorg 1.19 around the same time as kernel 4.8).

gsgatlin commented 7 years ago

@seekermoc Thanks a lot for the info. I will try to update the managed version tomorrow. Sorry for the delay. Sometimes I miss these nvidia updates. I'm still on fedora 23.

kmare commented 7 years ago

@gsgatlin thank you for your work! do you think you'll have the repo updated for fedora 25 when it comes out in a few days?

seekermoc commented 7 years ago

@gsgatlin No problem, thanks for all your help.

gsgatlin commented 7 years ago

@kmare Yes. I will update everything (centos 6,7.fedora 23,24,25,26) at the same time. I still need to test fedora 25 though.

olifre commented 7 years ago

Just to confirm the general picture: With 375.20, I can still reproduce the original problem (unless I add pcie_port_pm=off). After all, 375.20 claims only to have fixed the (independent) issue of incompatiblity with Xorg 1.19 (which I don't use yet in any case).

kmare commented 7 years ago

While it's not explicitly listed in the changelog, could the new driver update 375.26 have fixed the problem mentioned here? Has anyone tried it?

https://devtalk.nvidia.com/default/topic/981831

edoantonioco commented 7 years ago

Just to confirm than this also happen on the latest stable nvidia 375.26 on kernel 4.9. Once I start the pc I can use the dedicated card without any problem. The way to reproduce this bug on my laptop is just to close the lid (send it to sleep) and start using the pc again. Now bumblebee cant use the nvidia card.

anolting commented 7 years ago

Hi all,

I'm still having this problem with 4.9 and 375.26 on openSUSE Tumbleweed. If I'm forcing the kernel to switch back the PM method the laptop starts up into rl3 and freezes before I can login.

The Laptop is a DELL Inspiron 15 Series 7000 (7559) with a GTX960M.

Thanks Alex

jramapuram commented 7 years ago

On kernel 4.9 and using pcie_port_pm=off allows proper usage of bumblebee, however my external display is not detected. I am powering that via a thunderbolt 3 --> thunderbolt 2 adaptor. It generally lists as DP1 when using pcie_port_pm=on however is not at all listed otherwise. I have tried using intel-virtual-output as stated here to no avail.

Edit: realized this was due to another issue and not bumblebee; tb needs to be set to legacy mode to work properly in linux

JohnOShock commented 7 years ago

I'm using a laptop with an Nvidia 940M optimus... was using Bumblebee to switch but after upgrading the kernel to 4.8 and later 4.9 I experienced crashes .... poor start up and shutdown times... all of which stopped when I revert back to using the intel card only with Nouveau.. I am on openSUSE Tumbleweed. Someone suggested I use the proprietary driver only 375.26 with PRIME but it did not really solve anything. Plasma desktop wouldn't start at all

jrupinski commented 7 years ago

I still get this error:

ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.

When I run sudo bumblebee-nvidia --debug on Fedora 25. I installed managed repo using this guide: https://fedoraproject.org/wiki/Bumblebee#Using_bumblebee_software

Kernel: 4.9.11-200.fc25.x86_64

Hardware: Lenovo y510p CPU: i5 4200m GPU: Nvidia GT 755M

gsgatlin commented 7 years ago

@rupek1995 Do you have the kernel-devel package installed and is it the same version as your running kernel? (uname -r)

seekermoc commented 7 years ago

For some reason lately Fedora has been installing with kernel-debug-devel instead of kernel-devel and when you try to install kernel-devel it will say it's already installed. You need to remove the debug one first, then install the normal debug.On Feb 28, 2017 7:24 AM, gsgatlin notifications@github.com wrote:@rupek1995 Do you have the kernel-devel package installed and is it the same version as your running kernel? (uname -r)

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

jrupinski commented 7 years ago

EDIT: After rebooting for the second time system froze for about 30 seconds, and after I logged in there were SELinux errors about systemd, bbswitch, nvidia.ko and gnomeshell. Will removing SELinux fix this?

There was a kernel update after my post, I updated it, managed to successfully download the kernel-devel for my kernel, deleted kernel-debug-devel just in case. Nvidia driver unpacks... but now installation prints out an error like this:

`> -> Kernel module compilation complete.

ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information. -> Kernel module load error: Permission denied -> Kernel messages: [ 68.542964] iwlwifi 0000:08:00.0: Radio type=0x2-0x0-0x0 [ 68.595688] IPv6: ADDRCONF(NETDEV_UP): wlp8s0: link is not ready [ 69.121635] wlp8s0: authenticate with 18:a6:f7:65:30:b4 [ 69.125025] wlp8s0: send auth to 18:a6:f7:65:30:b4 (try 1/3) [ 69.127213] wlp8s0: authenticated [ 69.128834] wlp8s0: associate with 18:a6:f7:65:30:b4 (try 1/3) [ 69.133463] wlp8s0: RX AssocResp from 18:a6:f7:65:30:b4 (capab=0x431 status=0 aid=2) [ 69.158037] wlp8s0: associated [ 69.158084] IPv6: ADDRCONF(NETDEV_CHANGE): wlp8s0: link becomes ready [ 71.708158] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this. [ 75.815808] Netfilter messages via NETLINK v0.30. [ 75.846240] ip_set: protocol 6 [ 95.459594] tun: Universal TUN/TAP device driver, 1.6 [ 95.459595] tun: (C) 1999-2004 Max Krasnyansky maxk@qualcomm.com [ 95.499851] virbr0: port 1(virbr0-nic) entered blocking state [ 95.499854] virbr0: port 1(virbr0-nic) entered disabled state [ 95.499942] device virbr0-nic entered promiscuous mode [ 96.510578] virbr0: port 1(virbr0-nic) entered blocking state [ 96.510580] virbr0: port 1(virbr0-nic) entered listening state [ 98.320456] virbr0: port 1(virbr0-nic) entered disabled state [ 332.816405] mce: [Hardware Error]: Machine check events logged [ 794.335093] fuse init (API version 7.26) [ 796.061920] Bluetooth: RFCOMM TTY layer initialized [ 796.061928] Bluetooth: RFCOMM socket layer initialized [ 796.061981] Bluetooth: RFCOMM ver 1.11 ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.`

xen0f0n commented 7 years ago

pcie_port_pm=off works for me:

Dell 3542, Fedora 25, kernel 4.9, GeForce 840M

jrupinski commented 7 years ago

@xen0f0n Thanks for the tip! I already managed to get it to work by changing the SELinux to permissive mode.

If anyone has this problem and pcie_port_pm=off doesn't work for you, you can try SELinux method:

  • Update kernel to newest version
  • Set SELinux to permissive mode (sudo dnf install /usr/bin/system-config-selinux* - using this tool)
  • Reboot Fedora twice - on second reboot bumblebee should install during login (that's why it might freeze for about a minute or two)
  • Check if it works - use bumblebee-nvidia --check
  • ???
  • PROFIT
mcku commented 6 years ago

Hi, instead of disabling runtime PCI power management, would it be OK to selectively enable PCI runtime power management through udev? The following works fine fo far:

First, get the device ids using lspci -k. By trial and error, disabling power management for

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
    Subsystem: ASUSTeK Computer Inc. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 05)
    Kernel driver in use: pcieport
    Kernel modules: shpchp
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev a1)
    Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile]
    Kernel driver in use: nvidia
    Kernel modules: nouveau, nvidia_drm, nvidia

was sufficient. This was possible by the following workaround:

/etc/udev/rules.d/pci_pm.rules
# use lspci -k to get bus ids

#ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:00.0", ATTR{power/control}="auto"
#ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:01.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:02.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:08.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:14.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:14.2", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:15.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:16.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:17.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1c.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1c.3", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1c.6", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1d.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.2", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.3", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.4", ATTR{power/control}="auto"
#NVIDIA 
#ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:01:00.0", ATTR{power/control}="off"

ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:02:00.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:03:00.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:04:00.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:05:00.0", ATTR{power/control}="auto"

Using powertop, I could verify that other PCI devices appear to be power managed. And laptop battery usage is almost as good as PM was fully enabled.

If there is an easier way to do this, without going through the bus ids etc, life would be easier. But now I can use the laptop for coding and gaming, without rebooting in between, and with power management enabled.. Please advise if anything is missing or wrong. Thanks..

arcivanov commented 6 years ago

Actually I can tell you that this works for me.

Run sudo ~/nvidia_reset.sh. And the driver compiles and installs.

bash-4.4$ cat ~/nvidia_reset.sh 
#/bin/bash -eEu

systemctl stop bumblebeed
systemctl stop bumblebee-nvidia
rmmod nvidia
rmmod bbswitch
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan
modprobe bbswitch
systemctl start bumblebeed
systemctl start bumblebee-nvidia
azat commented 4 years ago

If there is an easier way to do this, without going through the bus ids etc, life would be easier

SUBSYSTEM!="pci", GOTO="pci_end"
ACTION!="add", GOTO="pci_end"
# Disable PM for NVIDIA to overcome "issue" in the nvidia driver
KERNELS=="0000:01:00.0", GOTO="pci_end"
TEST=="power/control", ATTR{power/control}="auto"
LABEL="pci_end"