Open skoehler opened 7 years ago
Those issues are quite known/expected I think. Did you take a look at https://github.com/Bumblebee-Project/bbswitch/issues/140?
Something else is at work here. I added the suggest workaround pcie_port_pm=off to my grub config and it didn't help. The system still freezes during bootup, probably when bumblebeed is loaded.
Update: starting X will lock up the machine if the Nvidia device is OFF. So bbswitch itself seems to work, but X won't start. Neither nouveau or nvidia kernel module is loaded. Does X try to talk to the PCI device, somehow?
Sorry, answered in a hurry. I have two laptop at the moment:
– Dell XPS 9530, which works OK with Bumblebee, not sure with bbswitch alone, should test that. – HP zBook Studio G3, on which I use bbswitch alone (need to try Bumblebee), but don’t load it at boot since it then freezes my system. So that looks like your case (but it’s not a Pascal GPU).
Both are affected by #140 btw.
I didn’t have the time until now to look into those issues, but I’ll do so next week. I did some tests with nouveau too, but don’t remember the results, and a new kernel requires new tests. ;)
I’ll keep you updated, and we’ll see with @Lekensteyn what can be done then.
Something loads the nvidia kernel module when I start sddm. I blame X for that, yet the Xorg logs mention nothing. That seems to be the reason why my laptop hangs when starting X with the dGPU turned off.
From bug #140 I gather that support for PCIe PM is in the works and that will eventually replace the DSM method?
The hang issue is possibly not related to pcie_port_pm, but https://bugzilla.kernel.org/show_bug.cgi?id=156341 (Bumblebee-Project/Bumblebee#764). Try the acpi_osi workaround listed there.
@Lekensteyn OK that’s definitively it for me (my machine is already listed by someone else). Looks like I’ve been missing a lot of fun lately (my comment review queue for Bumblebee is at 1098 right now…). I’ve added myself to the CC list.
For me Github still shows 143 unread notifications for Bumblebee (none for bbswitch), but there are lots of open issues that still need a response (luckily users are helping each other). Hmm
I think I've given answers to all Ubuntu/Debian related issues, if I missed some please do tag me
I am also seeing a hang on startup too on my Precision 5200 running Ubuntu with 4.9.8, if I do no install the module and manually load it everything is fine, but if I install it and reboot my machine hangs before getting to the login prompt with some ACPI related errors.
acpi_osi=! acpi_osi=Windows 2009 got me booting but things like backlight did not work, for now I am just running a make load
whenever I reboot as that is working without installing the module and hopefully there is a fix I have not tried yet :)!
I also have the startup issue on Dell XPS 9560, and it seems to be mainly related to X, as it does not seem to affect Wayland.
I can confirm this @zeraien I just installed 3.22 and moved over to it today using wayland, I can boot fine with bbswitch installed although my system does lock up a short time after booting.
I also have issues with freezing while in Wayland, but it seems to be related to the launching of older X-based applications (ex: steam, wine), which probably start the X machinery and this probably triggers the nvidia driver/bbswitch and thus you get a crash.
One way I avoided a crash with X, is by forcing bbswitch to turn the dGPU ON, before logging into X. So it seems related to the card being OFF and then X starts and crashes.
I'm on Fedora 25 (kernel 4.9.8), bumblebee 3.2.1, Nvidia 1050, Dell XPS 9560
If I just manually load bbswitch sudo make load
on each boot everything is working fine for me without any crashes, Only seems to be an issue for me when installed, I might just run a script to load it this way after boot.
@stefansedich can you explain what exactly you're doing with make load
? I'm on XPS 15 9560 and having the freeze with bbswitch off. Indeed acpi_osi=! acpi_osi=Windows 2009 helps, but that disables touchpad as discussed above.
Sure @pronobis so basically I did not run make install, I booted to recovery mode, then ran make uninstall
. Once my machine starts back up I simply run make load
to load the driver, which does mean if I reboot I need to do this again but this has been working nice and stable for me until I find out how to get it working properly when installed.
Does that mean that you can run the commands that cause freeze after you load bbswitch? For instance, in my case it's nvidia-smi, lspci, lshw etc.
I don't seem to get any freeze running lspci or lshw, I could be wrong here mind you but I can tell from powertop that my nvidia card is turned off so have assumed that it was all working as expected.
On Tue, Feb 14, 2017 at 8:31 PM Andrzej Pronobis notifications@github.com wrote:
Does that mean that you can run the commands that cause freeze after you load bbswitch? For instance, in my case it's nvidia-smi, lspci, lshw etc.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bumblebee-Project/bbswitch/issues/148#issuecomment-279914390, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIqVAp6as6eb_Iw_3N1fnRmgjNXAlfIks5rcn-IgaJpZM4L3MIz .
@stefansedich Thanks for the info. Just to summarize:
What is your hardware?
I am actually on wayland now but was working under X too, cat /proc/acpi/bbswitch gives me a file not found, but looking at gpu-manager I can see it turns the card off, and looking at my idle watts it appears the card is off as if I reboot running the nvidia my watts are 5w or so higher.
And yes can run lscpi many times fine, my hardware is a Precision 5200 so might be a different story due to the different card.
On Wed, Feb 15, 2017 at 1:12 PM Andrzej Pronobis notifications@github.com wrote:
@stefansedich https://github.com/stefansedich Thanks for the info. Just to summarize:
- cat /proc/acpi/bbswitch gives OFF
- you are on X, not wayland
- you can run lspci several times in a row and no freeze happens?
What is your hardware?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bumblebee-Project/bbswitch/issues/148#issuecomment-280140251, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIqVJmj5495gMIYepk1MOgpN5-QLD0rks5rc2o_gaJpZM4L3MIz .
I am using acpi_osi='!Windows 2013' pcie_port_pm=off
and doing echo OFF > /proc/acpi/bbswitch
followed by running lspci
twice lets my machine freeze (Dell XPS 9560). Neither nouveau or nvidia kernel modules are loaded.
@stefansedich The fact that /proc/acpi/bbswitch gives you file not found means your bbswitch module is likely not loaded. I think I also did not have the problem with simply no NVidia driver installed and used. The problem exists only if you power down a card that is managed by an nvidia driver (nvidia-smi displays the card) and X is using the Intel card with nvidia-prime.
The bug is likely to be the same or related to: https://bugzilla.kernel.org/show_bug.cgi?id=156341
I would urge all Kaby Lake Dell laptop users to post there, since this issue has been previously reported mostly for SkyLake laptops.
There is a patch in https://bugzilla.kernel.org/show_bug.cgi?id=156341 that solves the freeze when applied against kernel 4.10.
I tried the patch with Linux 4.10 and my Dell XPS 9560 is not freezing anymore. Nice! However, the nvidia driver version 378.13 doesn't seem to work. It gives some errors and I couldn't make them go away yet. The errors are:
NVRM: RmInitAdapter failed! (0x26:0xffff:1097)
NVRM: rm_init_adapter failed for device bearing minor number 0
I should add that the nvidia module is not unloaded and the card is not turned off again if primusrun fails.
@skoehler It's good to know that it did indeed work. Downgrading your nvidia module version is reported to solve the loading issue. See comment #18 in the Archlinux thread below:
Downgrading the nvidia driver to 375.39 worked. I can now play games such as Dota 2 with my Nvidia GTX 1050.
Interesting since upgrading to 4.10 I just installed the module again and rebooted, and everything is working as expected! no lockup on boot for me now.
@stefansedich that's weird... aparrently the fix has not been accepted upstream:
Update: the patch didn't get accepted, but I got the hint to try booting with acpi_rev_override. acpi_rev_override=5 instead of the acpi_osi stuff works for me (currently on kernel 4.8), no lockups and the touchpad issues also seem to be gone.
For what it's worth, it doesn't solve the issue for me - kernel 4.9.11, bios 1.1.3. I still get the hard lock when I switch the nvidia card off.
OK, I tried the kernel option acpi_rev_override
(it's Boolean, it doesn't take a value like 5, see kernel documentation) and it works just as well as the patch.
Fair point - although briefly skimming the code, 5 should still have had the same effect (i.e. evaluated to true). Just tried =1
in any event. Still had the lockup - happened to be on the console this time there was a stacktrace. @skoehler What BIOS version are you running - there were apparently changes to improve nvidia stability which I applied while trying to improve things.
For those playing at home...your Kernel must include the CONFIG_ACPI_REV_OVERRIDE_POSSIBLE
option. The default ArchLinux kernel does not - I can't speak for other distributions.
$ zcat /proc/config.gz | grep CONFIG_ACPI_REV_OVERRIDE_POSSIBLE
# CONFIG_ACPI_REV_OVERRIDE_POSSIBLE is not set
One kernel rebuild later, and setting acpi_rev_override
solves the lockup issue.
On NixOS I was getting kernel panics as well on XPS 9560.
Following works:
hardware.nvidiaOptimus.disable = true;
boot.kernelParams = [ "acpi_rev_override=1" ];
I also own the dell xps 15 9560, running ubuntu 17.04. Everything I read about managing the built-in graphic chip and the nvidia graphic card confuses me. Is there a working solution today that is safe and does not require expert linux skills? Could someone write a summary of the steps to take? Thanks a lot!
@saroele can't help with Ubuntu specifics but essentially what you currently need (and keep in mind that this is a dynamic situation) is to install the latest NVIDIA drivers, install bumblebee and install bbswitch. This is documented in the Arch Linux wiki and at least the general idea can be applied to any distro, you only have to then use your own package manager, etc. Once you have these three moving parts, your dedicated GPU should be off by default and thus not using power. Whenever you want to run an app on the dedicated GPU, you run it with "optirun
This should get you a basic working system. What some users experience is that sometimes switching the GPU on and/or off will cause the system to freeze. To get rid of that, you can set the kernel parameter acpi_rev_override=1
. This is something that your bootloader (normally GRUB) passes onto the Linux kernel so it allows an ACPI override. Once you've done that, you're all set.
Personally I'm running Arch Linux which is great because it polishes your Linux skills and gives you a better understanding of what's running where. It does also cost quite some initial time to understand it so tread carefully. An alternative is Manjaro Linux, which is Arch with training wheels. You'll get all the benefits of Arch and quite some helpers to get you up and running.
I've got the 9560 FHD running Ubuntu 17.04 and nvidia-375 and I can confirm that adding the acpi_rev_override=1
option in grub does the trick.
Here's the benchmarking results: http://openbenchmarking.org/result/1706111-TR-XPS95602060
I'm posting here because this came up as one of the search results while googling :)
BTW this guy https://www.reddit.com/r/Dell/comments/63cavx/fixed_nvidia_1050_freezing_in_ubuntu_linux/ has got good instructions for whoever wants to try this out.
I'm not sure what are the side effects of this kernel option is though, so I consider this to be a temporary issue until Ubuntu ships us a updated kernel with the proper patches.
Have the same bug on 4.12_rc6 kernel on Gentoo: https://bugs.freedesktop.org/show_bug.cgi?id=101553
Nobody in kernel/nouveau seems to working on this (no even requests for debugging/testing/more information, whatever). So do not expect any fixes soon.
See this bug for tracking: https://bugs.acpica.org/show_bug.cgi?id=963 See also this pull request, but probably it will never get merged https://github.com/acpica/acpica/pull/189
For those who wants to try nouveau on this laptop - there are issues too: https://bugs.freedesktop.org/show_bug.cgi?id=100228 and https://bugzilla.kernel.org/show_bug.cgi?id=156341
the
acpi_rev_override=1
option in grub does the trick
I just wanted to report that I found a change to this since recently. While the kernel parameter worked for me as well, it does no longer. I'm on manjaro stable kernel 4.13, and since the last update my system will hang on boot with the override set to 1. Luckily I have not experienced the original issues again since I have disabled it again to be able to boot. Just wanted to leave this here if others find themselves in a similar situation.
I can report that 4.13 does break again and for me acpi_rev_override
has no effect either way, machine won't boot up.
My Dell XPS 9560 works without problems with 4.13.x. I'm just using acpi_rev_override
.
@skoehler so acpi_rev_override=1
?
On my Dell XPS 9560 with Linux 4.13.3-1 I have acpi_rev_override=1
which does indeed set bbswitch to OFF
; however, I can't turn bbswitch
to ON
, neither manually with echo ON | sudo tee /proc/acpi/bbswitch
nor via optirun, e.g. optirun -vv glxgears
.
Both attempts yield the same output in journalctl
-
... kernel: bbswitch: enabling discrete graphics
... kernel: pci 0000:01:00.0: Refused to change power state, currently in D3
My kernel parameters are pcie_port_pm=off acpi_rev_override
beside the usual root device etc.
acpi_rev_override
doesn't take an argument.
Apparently kernel 4.13.6 doesn't work for me, could be unrelated to bumblebee since it won't boot even with ON or OFF.
@domenkozar sounds like an unrelated problem? Unless "won't boot" means that it does not get to the login screen or something. What was the last working kernel version?
FWIW, my brother also has this laptop (Dell Inc. XPS 15 9560/05FFDN, BIOS 1.3.3 05/08/2017) now and I can confirm issues with it (using Arch Linux with Linux 4.13.7).
The acpi_rev_override
boot parameter (value does not matter as @skoehler observed) combined with "Windows 2015" (i.e. the default without acpi_osi
override) will force the old logic. In theory it should work, but with nouveau it hung (see references from @XVilka above).
One major issue that it sucks much battery during system sleep (51% in 9h, approx. 49Wh with the extended battery model). To be investigated later.
Just upgraded vom 4.13.4 to 4.13.11 and my systems hangs during boot. Probably due to this issue. So disregard my above comment that everything is working fine with 4.13.x. Clearly, something is broken after 4.13.4.
See also this pull request - should improve the situation after landing in the kernel https://github.com/acpica/acpica/pull/330
@pronobis
The fact that /proc/acpi/bbswitch gives you file not found means your bbswitch module is likely not loaded. I think I also did not have the problem with simply no NVidia driver installed and used. The problem exists only if you power down a card that is managed by an nvidia driver (nvidia-smi displays the card) and X is using the Intel card with nvidia-prime.
I've reproduced this with a blacklisted nouveau and no bbswitch/bumblebee.
I use the following to turn off the GPU:
/etc/tmpfiles.d/gpu-off.conf
:
w /proc/acpi/call - - - - \\_SB.PCI0.PEG0.PEGP._OFF
~ » uname -a
Linux archenemy 4.15.5-1-ARCH #1 SMP PREEMPT Thu Feb 22 22:15:20 UTC 2018 x86_64 GNU/Linux
~ » cat /proc/cmdline
initrd=\initramfs-linux.img root=UUID=3879c851-c423-45ae-853e-320fa144a04c rootflags=subvol=@ pci=nommconf
~ » lsmod
Module Size Used by
ccm 20480 6
xt_CHECKSUM 16384 1
ipt_REJECT 16384 2
xt_tcpudp 16384 6
tun 45056 1
bridge 188416 0
stp 16384 1 bridge
llc 16384 2 bridge,stp
devlink 49152 0
ebtable_filter 16384 0
ebtables 36864 1 ebtable_filter
ip6table_filter 16384 0
ip6_tables 32768 1 ip6table_filter
iptable_nat 16384 0
nf_nat_ipv4 16384 1 iptable_nat
nf_nat 36864 1 nf_nat_ipv4
iptable_mangle 16384 1
iptable_filter 16384 1
tpm_crb 16384 0
nft_reject_inet 16384 1
nf_reject_ipv4 16384 2 ipt_REJECT,nft_reject_inet
nf_reject_ipv6 16384 1 nft_reject_inet
nft_reject 16384 1 nft_reject_inet
nft_meta 16384 4
nf_conntrack_ipv6 16384 2
nf_defrag_ipv6 36864 1 nf_conntrack_ipv6
nf_conntrack_ipv4 16384 3
nf_defrag_ipv4 16384 1 nf_conntrack_ipv4
nft_ct 20480 2
nf_conntrack 155648 5 nft_ct,nf_conntrack_ipv6,nf_conntrack_ipv4,nf_nat_ipv4,nf_nat
libcrc32c 16384 2 nf_conntrack,nf_nat
crc32c_generic 16384 0
nft_set_bitmap 16384 0
nft_set_hash 28672 1
nft_set_rbtree 16384 0
nf_tables_inet 16384 4
nf_tables_ipv6 16384 1 nf_tables_inet
nf_tables_ipv4 16384 1 nf_tables_inet
nf_tables 106496 28 nft_ct,nft_set_bitmap,nft_reject,nft_set_hash,nf_tables_ipv6,nf_tables_ipv4,nft_reject_inet,nft_meta,nft_set_rbtree,nf_tables_inet
nfnetlink 16384 1 nf_tables
nls_iso8859_1 16384 1
nls_cp437 20480 1
vfat 20480 1
fat 77824 1 vfat
arc4 16384 2
snd_hda_codec_hdmi 57344 1
ath10k_pci 65536 0
ath10k_core 466944 1 ath10k_pci
snd_hda_codec_realtek 110592 1
ath 32768 1 ath10k_core
snd_hda_codec_generic 86016 1 snd_hda_codec_realtek
mac80211 909312 1 ath10k_core
cfg80211 741376 3 mac80211,ath,ath10k_core
rtsx_pci_ms 20480 0
memstick 16384 1 rtsx_pci_ms
mei_wdt 16384 0
joydev 24576 0
mousedev 24576 0
hid_multitouch 24576 0
uvcvideo 102400 0
videobuf2_vmalloc 16384 1 uvcvideo
videobuf2_memops 16384 1 videobuf2_vmalloc
videobuf2_v4l2 28672 1 uvcvideo
videobuf2_core 45056 2 uvcvideo,videobuf2_v4l2
videodev 208896 3 uvcvideo,videobuf2_core,videobuf2_v4l2
media 45056 2 uvcvideo,videodev
iTCO_wdt 16384 0
iTCO_vendor_support 16384 1 iTCO_wdt
dell_smbios_wmi 16384 0
dell_wmi 16384 0
wmi_bmof 16384 0
dell_wmi_descriptor 16384 2 dell_wmi,dell_smbios_wmi
intel_wmi_thunderbolt 16384 0
mxm_wmi 16384 0
dell_laptop 24576 0
btusb 53248 0
dell_smbios_smm 16384 0
btrtl 16384 1 btusb
dell_smbios 16384 4 dell_wmi,dell_laptop,dell_smbios_wmi,dell_smbios_smm
btbcm 16384 1 btusb
btintel 16384 1 btusb
dcdbas 16384 1 dell_smbios_smm
dell_smm_hwmon 16384 0
bluetooth 634880 5 btrtl,btintel,btbcm,btusb
i915 1929216 15
intel_rapl 24576 0
x86_pkg_temp_thermal 16384 0
intel_powerclamp 16384 0
coretemp 16384 0
ecdh_generic 24576 1 bluetooth
kvm_intel 229376 0
rfkill 28672 7 bluetooth,dell_laptop,cfg80211
crc16 16384 1 bluetooth
kvm 704512 1 kvm_intel
irqbypass 16384 1 kvm
crct10dif_pclmul 16384 0
crc32_pclmul 16384 0
idma64 20480 0
ghash_clmulni_intel 16384 0
i2c_algo_bit 16384 1 i915
pcbc 16384 0
snd_hda_intel 45056 3
drm_kms_helper 200704 1 i915
snd_hda_codec 151552 4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
aesni_intel 188416 4
aes_x86_64 20480 1 aesni_intel
crypto_simd 16384 1 aesni_intel
glue_helper 16384 1 aesni_intel
snd_hda_core 94208 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel
drm 466944 7 i915,drm_kms_helper
snd_hwdep 20480 1 snd_hda_codec
intel_cstate 16384 0
intel_uncore 131072 0
snd_pcm 135168 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi
psmouse 167936 0
input_leds 16384 0
snd_timer 36864 1 snd_pcm
intel_rapl_perf 16384 0
led_class 16384 2 input_leds,dell_laptop
mei_me 45056 1
processor_thermal_device 16384 0
intel_gtt 24576 1 i915
snd 98304 14 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek,snd_pcm
pcspkr 16384 0
agpgart 49152 2 intel_gtt,drm
i2c_hid 24576 0
intel_lpss_pci 20480 0
mei 106496 3 mei_me,mei_wdt
syscopyarea 16384 1 drm_kms_helper
i2c_i801 32768 0
soundcore 16384 1 snd
sysfillrect 16384 1 drm_kms_helper
intel_lpss 16384 1 intel_lpss_pci
intel_soc_dts_iosf 16384 1 processor_thermal_device
sysimgblt 16384 1 drm_kms_helper
intel_pch_thermal 16384 0
fb_sys_fops 16384 1 drm_kms_helper
shpchp 40960 0
hid 131072 2 i2c_hid,hid_multitouch
battery 20480 0
tpm_tis 16384 0
tpm_tis_core 20480 1 tpm_tis
rtc_cmos 24576 1
wmi 28672 6 dell_wmi,wmi_bmof,intel_wmi_thunderbolt,dell_wmi_descriptor,mxm_wmi,dell_smbios_wmi
tpm 65536 3 tpm_tis,tpm_crb,tpm_tis_core
int3400_thermal 16384 0
int3403_thermal 16384 0
acpi_thermal_rel 16384 1 int3400_thermal
int340x_thermal_zone 16384 2 int3403_thermal,processor_thermal_device
intel_hid 16384 0
evdev 20480 20
sparse_keymap 16384 2 dell_wmi,intel_hid
ac 16384 0
mac_hid 16384 0
acpi_call 16384 0
ip_tables 28672 3 iptable_mangle,iptable_filter,iptable_nat
x_tables 45056 9 ipt_REJECT,iptable_mangle,ip_tables,ebtables,iptable_filter,xt_tcpudp,xt_CHECKSUM,ip6table_filter,ip6_tables
btrfs 1331200 1
xor 24576 1 btrfs
zstd_decompress 94208 1 btrfs
zstd_compress 196608 1 btrfs
xxhash 16384 2 zstd_compress,zstd_decompress
raid6_pq 122880 1 btrfs
rtsx_pci_sdmmc 28672 0
mmc_core 172032 1 rtsx_pci_sdmmc
serio_raw 16384 0
atkbd 32768 0
libps2 16384 2 atkbd,psmouse
ahci 40960 0
libahci 40960 1 ahci
xhci_pci 16384 0
crc32c_intel 24576 2
libata 278528 2 ahci,libahci
xhci_hcd 258048 1 xhci_pci
rtsx_pci 65536 2 rtsx_pci_sdmmc,rtsx_pci_ms
scsi_mod 258048 1 libata
usbcore 286720 4 uvcvideo,xhci_pci,btusb,xhci_hcd
usb_common 16384 1 usbcore
i8042 32768 1 dell_laptop
serio 28672 6 serio_raw,atkbd,psmouse,i8042
```
On my brand new Dell XPS 9560, the bbswitch kernel module loads and repeatedly turning the GPU on and off lets the system hang. I don't know why and I can't see any kernel output, unfortunately. I'm using kernel 4.9.7.
My Distro is Gentoo. When booting with systemd, my system would actually hang before I could reach the login prompt. (Probably because the bumblebee daemon tries to disable the GPU). When booting with OpenRC (very serial booting process), I could reach the graphical login primpt, but turning GPU on and off repeatedly would result in a system freeze and crash. I just an echo ON/OFF > /proc/acpi/bbswitch.
It might be that the new 10 series GPU come with new ACPI tables for turning on/off the GPU. On a Dell XPS 15 9550 (960M GPU) everything was working fine.