Bumblebee-Project / Bumblebee

Bumblebee daemon and client rewritten in C
http://www.bumblebee-project.org/
GNU General Public License v3.0
1.29k stars 142 forks source link

Bumblebee doesn’t switch off the discreet card (kernels newer than 4.1) #780

Open eugenbalintescu opened 8 years ago

eugenbalintescu commented 8 years ago

Hello,

If I use a kernel newer that 4.1, bumblebee doesn't switch the discreet card off after use. I can only run a program with primusrun. If I close it, I can't run another unless I reboot the laptop and that takes a very long time. I get

[eugen@manjaro ~]$ cat /proc/acpi/bbswitch
0000:04:00.0 ON

Running dmesg I get

[   76.846847] bbswitch: enabling discrete graphics
[   76.968754] nvidia: module license 'NVIDIA' taints kernel.
[   76.968758] Disabling lock debugging due to kernel taint
[   76.977982] nvidia-nvlink: Nvlink Core is being initialized, major device number 245
[   76.978003] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  364.19  Tue Apr 19 14:44:55 PDT 2016
[   77.036333] vgaarb: this pci device is not a vga device
[   77.039645] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.039676] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.039690] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.039713] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.039727] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.039764] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.039777] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.054350] ACPI Warning: \_SB_.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150930/nsarguments-95)
[   77.194044] vgaarb: this pci device is not a vga device
[   77.264798] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  364.19  Tue Apr 19 14:15:03 PDT 2016
[  360.150412] INFO: task Xorg:1829 blocked for more than 120 seconds.
[  360.150417]       Tainted: P           O    4.4.14-3-MANJARO #1
[  360.150418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.150420] Xorg            D ffff880203803af8     0  1829    572 0x00400004
[  360.150424]  ffff880203803af8 00ffffffa1049701 ffff880255263700 ffff8802395a3700
[  360.150426]  ffff880203804000 ffff880203803cb0 ffff880203803ca8 0000000000000000
[  360.150428]  ffff8802395a3700 ffff880203803b10 ffffffff815a2cac 7fffffffffffffff
[  360.150431] Call Trace:
[  360.150436]  [<ffffffff815a2cac>] schedule+0x3c/0x90
[  360.150439]  [<ffffffff815a56f6>] schedule_timeout+0x1d6/0x260
[  360.150526]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  360.150583]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  360.150585]  [<ffffffff815a3821>] wait_for_common+0xc1/0x180
[  360.150589]  [<ffffffff810a1110>] ? wake_up_q+0x70/0x70
[  360.150591]  [<ffffffff815a38fd>] wait_for_completion+0x1d/0x20
[  360.150594]  [<ffffffff8108d002>] flush_workqueue+0x132/0x5e0
[  360.150676]  [<ffffffffa126a88b>] ? _nv014870rm+0x1b/0x40 [nvidia]
[  360.150734]  [<ffffffffa0d99bde>] os_flush_work_queue+0x4e/0x60 [nvidia]
[  360.150814]  [<ffffffffa12bcc77>] rm_disable_adapter+0x77/0x130 [nvidia]
[  360.150816]  [<ffffffff810bf400>] ? up+0x10/0x50
[  360.150873]  [<ffffffffa0d9d023>] ? nv_uvm_notify_stop_device+0x63/0x80 [nvidia]
[  360.150929]  [<ffffffffa0d8e78d>] ? nv_close_device+0xed/0x130 [nvidia]
[  360.150985]  [<ffffffffa0d909b4>] ? nvidia_close+0xd4/0x2c0 [nvidia]
[  360.151041]  [<ffffffffa0d8e39c>] ? nvidia_frontend_close+0x2c/0x50 [nvidia]
[  360.151043]  [<ffffffff811e52dc>] ? __fput+0x9c/0x1f0
[  360.151045]  [<ffffffff811e549e>] ? ____fput+0xe/0x10
[  360.151047]  [<ffffffff81093b74>] ? task_work_run+0x84/0xa0
[  360.151049]  [<ffffffff8100369a>] ? exit_to_usermode_loop+0xba/0xc0
[  360.151051]  [<ffffffff81003bde>] ? syscall_return_slowpath+0x4e/0x60
[  360.151054]  [<ffffffff815a68c8>] ? int_ret_from_sys_call+0x25/0x8f
[  480.153934] INFO: task Xorg:1829 blocked for more than 120 seconds.
[  480.153939]       Tainted: P           O    4.4.14-3-MANJARO #1
[  480.153940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.153942] Xorg            D ffff880203803af8     0  1829    572 0x00400004
[  480.153945]  ffff880203803af8 00ffffffa1049701 ffff880255263700 ffff8802395a3700
[  480.153948]  ffff880203804000 ffff880203803cb0 ffff880203803ca8 0000000000000000
[  480.153950]  ffff8802395a3700 ffff880203803b10 ffffffff815a2cac 7fffffffffffffff
[  480.153952] Call Trace:
[  480.153958]  [<ffffffff815a2cac>] schedule+0x3c/0x90
[  480.153961]  [<ffffffff815a56f6>] schedule_timeout+0x1d6/0x260
[  480.154047]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  480.154104]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  480.154106]  [<ffffffff815a3821>] wait_for_common+0xc1/0x180
[  480.154110]  [<ffffffff810a1110>] ? wake_up_q+0x70/0x70
[  480.154112]  [<ffffffff815a38fd>] wait_for_completion+0x1d/0x20
[  480.154114]  [<ffffffff8108d002>] flush_workqueue+0x132/0x5e0
[  480.154196]  [<ffffffffa126a88b>] ? _nv014870rm+0x1b/0x40 [nvidia]
[  480.154254]  [<ffffffffa0d99bde>] os_flush_work_queue+0x4e/0x60 [nvidia]
[  480.154333]  [<ffffffffa12bcc77>] rm_disable_adapter+0x77/0x130 [nvidia]
[  480.154336]  [<ffffffff810bf400>] ? up+0x10/0x50
[  480.154393]  [<ffffffffa0d9d023>] ? nv_uvm_notify_stop_device+0x63/0x80 [nvidia]
[  480.154449]  [<ffffffffa0d8e78d>] ? nv_close_device+0xed/0x130 [nvidia]
[  480.154504]  [<ffffffffa0d909b4>] ? nvidia_close+0xd4/0x2c0 [nvidia]
[  480.154560]  [<ffffffffa0d8e39c>] ? nvidia_frontend_close+0x2c/0x50 [nvidia]
[  480.154562]  [<ffffffff811e52dc>] ? __fput+0x9c/0x1f0
[  480.154564]  [<ffffffff811e549e>] ? ____fput+0xe/0x10
[  480.154566]  [<ffffffff81093b74>] ? task_work_run+0x84/0xa0
[  480.154568]  [<ffffffff8100369a>] ? exit_to_usermode_loop+0xba/0xc0
[  480.154570]  [<ffffffff81003bde>] ? syscall_return_slowpath+0x4e/0x60
[  480.154573]  [<ffffffff815a68c8>] ? int_ret_from_sys_call+0x25/0x8f
[  600.156922] INFO: task Xorg:1829 blocked for more than 120 seconds.
[  600.156926]       Tainted: P           O    4.4.14-3-MANJARO #1
[  600.156927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  600.156929] Xorg            D ffff880203803af8     0  1829    572 0x00400004
[  600.156933]  ffff880203803af8 00ffffffa1049701 ffff880255263700 ffff8802395a3700
[  600.156935]  ffff880203804000 ffff880203803cb0 ffff880203803ca8 0000000000000000
[  600.156938]  ffff8802395a3700 ffff880203803b10 ffffffff815a2cac 7fffffffffffffff
[  600.156940] Call Trace:
[  600.156945]  [<ffffffff815a2cac>] schedule+0x3c/0x90
[  600.156948]  [<ffffffff815a56f6>] schedule_timeout+0x1d6/0x260
[  600.157011]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  600.157059]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  600.157061]  [<ffffffff815a3821>] wait_for_common+0xc1/0x180
[  600.157064]  [<ffffffff810a1110>] ? wake_up_q+0x70/0x70
[  600.157066]  [<ffffffff815a38fd>] wait_for_completion+0x1d/0x20
[  600.157069]  [<ffffffff8108d002>] flush_workqueue+0x132/0x5e0
[  600.157146]  [<ffffffffa126a88b>] ? _nv014870rm+0x1b/0x40 [nvidia]
[  600.157196]  [<ffffffffa0d99bde>] os_flush_work_queue+0x4e/0x60 [nvidia]
[  600.157273]  [<ffffffffa12bcc77>] rm_disable_adapter+0x77/0x130 [nvidia]
[  600.157276]  [<ffffffff810bf400>] ? up+0x10/0x50
[  600.157323]  [<ffffffffa0d9d023>] ? nv_uvm_notify_stop_device+0x63/0x80 [nvidia]
[  600.157368]  [<ffffffffa0d8e78d>] ? nv_close_device+0xed/0x130 [nvidia]
[  600.157414]  [<ffffffffa0d909b4>] ? nvidia_close+0xd4/0x2c0 [nvidia]
[  600.157459]  [<ffffffffa0d8e39c>] ? nvidia_frontend_close+0x2c/0x50 [nvidia]
[  600.157461]  [<ffffffff811e52dc>] ? __fput+0x9c/0x1f0
[  600.157463]  [<ffffffff811e549e>] ? ____fput+0xe/0x10
[  600.157465]  [<ffffffff81093b74>] ? task_work_run+0x84/0xa0
[  600.157467]  [<ffffffff8100369a>] ? exit_to_usermode_loop+0xba/0xc0
[  600.157469]  [<ffffffff81003bde>] ? syscall_return_slowpath+0x4e/0x60
[  600.157472]  [<ffffffff815a68c8>] ? int_ret_from_sys_call+0x25/0x8f
[  720.159838] INFO: task Xorg:1829 blocked for more than 120 seconds.
[  720.159842]       Tainted: P           O    4.4.14-3-MANJARO #1
[  720.159843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.159845] Xorg            D ffff880203803af8     0  1829    572 0x00400004
[  720.159849]  ffff880203803af8 00ffffffa1049701 ffff880255263700 ffff8802395a3700
[  720.159852]  ffff880203804000 ffff880203803cb0 ffff880203803ca8 0000000000000000
[  720.159854]  ffff8802395a3700 ffff880203803b10 ffffffff815a2cac 7fffffffffffffff
[  720.159856] Call Trace:
[  720.159861]  [<ffffffff815a2cac>] schedule+0x3c/0x90
[  720.159864]  [<ffffffff815a56f6>] schedule_timeout+0x1d6/0x260
[  720.159947]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  720.160004]  [<ffffffffa0d99d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[  720.160006]  [<ffffffff815a3821>] wait_for_common+0xc1/0x180
[  720.160009]  [<ffffffff810a1110>] ? wake_up_q+0x70/0x70
[  720.160011]  [<ffffffff815a38fd>] wait_for_completion+0x1d/0x20
[  720.160014]  [<ffffffff8108d002>] flush_workqueue+0x132/0x5e0
[  720.160096]  [<ffffffffa126a88b>] ? _nv014870rm+0x1b/0x40 [nvidia]
[  720.160154]  [<ffffffffa0d99bde>] os_flush_work_queue+0x4e/0x60 [nvidia]
[  720.160234]  [<ffffffffa12bcc77>] rm_disable_adapter+0x77/0x130 [nvidia]
[  720.160236]  [<ffffffff810bf400>] ? up+0x10/0x50
[  720.160293]  [<ffffffffa0d9d023>] ? nv_uvm_notify_stop_device+0x63/0x80 [nvidia]
[  720.160349]  [<ffffffffa0d8e78d>] ? nv_close_device+0xed/0x130 [nvidia]
[  720.160405]  [<ffffffffa0d909b4>] ? nvidia_close+0xd4/0x2c0 [nvidia]
[  720.160460]  [<ffffffffa0d8e39c>] ? nvidia_frontend_close+0x2c/0x50 [nvidia]
[  720.160462]  [<ffffffff811e52dc>] ? __fput+0x9c/0x1f0
[  720.160464]  [<ffffffff811e549e>] ? ____fput+0xe/0x10
[  720.160466]  [<ffffffff81093b74>] ? task_work_run+0x84/0xa0
[  720.160468]  [<ffffffff8100369a>] ? exit_to_usermode_loop+0xba/0xc0
[  720.160470]  [<ffffffff81003bde>] ? syscall_return_slowpath+0x4e/0x60
[  720.160473]  [<ffffffff815a68c8>] ? int_ret_from_sys_call+0x25/0x8f

I was told to report this regression here. Someone else on the manjaro forum has the exact problem. My laptop is an Acer Aspire E 17, i5-5200U, Nvidia GeForce 840M

Thanks for your help

bluca commented 8 years ago

What version of bumblebee and bbswitch are you using?

eugenbalintescu commented 8 years ago

Hello,

I have bumblebee 3.2.1-14 and bbswitch 0.8-15.

Thanks

On Friday, July 8, 2016 12:03:23 PM EEST Luca Boccassi wrote:

What version of bumblebee and bbswitch are you using?


You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/Bumblebee-Project/Bumblebee/issues/780#issuecomment-23144 5650

bluca commented 8 years ago

Unfortunately I'm not familiar with Manjaro and with the patches that are pushed on top of the kernel and bumblebee and bbswitch.

Try to use the --debug flag in both bumblebeed (in the init or systemd unit file) and when running optirun and post the logs to a gist/pastebin (not inline please).

Also check if the nvidia kernel modules are still loaded: lsmod | grep nvidia

Finally, does it also happen with nouveau?

eugenbalintescu commented 8 years ago

[eugen@manjaro ~]$ lsmod | grep nvidia nvidia_modeset 745472 0 nvidia 10145792 7 nvidia_modeset

I installed nouveau and, for some reason, my steam games don't start with it. But, after I try to run one, I get:

[eugen@manjaro ~]$ cat /proc/acpi/bbswitch 0000:04:00.0 ON

and

[eugen@manjaro ~]$ lsmod | grep nouveau nouveau 1449984 0 mxm_wmi 16384 1 nouveau ttm 77824 1 nouveau drm_kms_helper 106496 2 i915,nouveau drm 286720 11 ttm,i915,drm_kms_helper,nouveau i2c_algo_bit 16384 2 i915,nouveau wmi 20480 3 acer_wmi,mxm_wmi,nouveau video 36864 3 i915,acer_wmi,nouveau button 16384 2 i915,nouveau

I don't understand how to use the "--debug" flag, sorry. Can you be more specific?

Thanks

On Friday, July 8, 2016 12:41:33 PM EEST Luca Boccassi wrote:

Unfortunately I'm not familiar with Manjaro and with the patches that are pushed on top of the kernel and bumblebee and bbswitch.

Try to use the --debug flag in both bumblebeed (in the init or systemd unit file) and when running optirun and post the logs to a gist/pastebin (not inline please).

Also check if the nvidia kernel modules are still loaded: lsmod | grep nvidia

Finally, does it also happen with nouveau?


You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/Bumblebee-Project/Bumblebee/issues/780#issuecomment-23145 3993

bluca commented 8 years ago

Depending on your distribution packaging choices, you'll have either an init script or a systemd service unit for the bumblebeed daemon. You need to find which is used, and then edit it to add the --debug flag to the execution of bumblebeed, and then you'll get additional debugs in the logs.

eugenbalintescu commented 8 years ago

OK, so I edited /usr/lib/systemd/system/bumblebeed.service and added "-- debug" like this[1]. I hope I got it right. Now please tell me what logs to check.

Sorry you have to take me step by step but I am just a linux desktop user.

eugenbalintescu commented 8 years ago

http://pastebin.com/7qEkk8ci This is how I edited bumblebeed.service

bluca commented 8 years ago

That is right, now if you restart it with sudo systemctl restart bumblebeed and then run optirun, in you system log you should find the daemons log:

sudo journalctl -b -u bumblebeed

You can also run optirun with --debug, the logs from it will be useful too

eugenbalintescu commented 8 years ago

Good morning,

I fell asleep last night, sorry. So I ran optirun --debug glxgears and this is what I got: http://pastebin.com/BWRyJatE

The output for sudo journalctl -b -u bumblebeed is here: http://pastebin.com/v45sdJVR

FadeMind commented 8 years ago

@bluca Hello, I am maintainer of bumblebee package in Manjaro. I merged ALL major fixes from develop branch (included kmod unloading). For my setup (GT630M on ASUS) all works fine. Changes: https://github.com/FadeMind/Bumblebee/commits/master

See topic: https://forum.manjaro.org/t/bumblebee-not-switching-back-to-igpu-after-quitting-optirun/1054/8 and ISSUE https://github.com/manjaro/packages-community/issues/175

regards

FadeMind

FadeMind commented 8 years ago

@eugenbalintescu I see issue with acpid. Service is running?

sudo systemctl -l status acpid
eugenbalintescu commented 8 years ago

● acpid.service - ACPI event daemon Loaded: loaded (/usr/lib/systemd/system/acpid.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: man:acpid(8)

eugenbalintescu commented 8 years ago

I started acpid, and reran optirun glxgears. Dmesg gives me this:

http://pastebin.com/PXScmpmE

FadeMind commented 8 years ago

Enable acpid:

sudo systemctl enable acpid

and reboot. Check

/var/log/Xorg.8.log

and check optirun and primusrun commands work.

eugenbalintescu commented 8 years ago

I already enabled it:

[eugen@manjaro ~]$ sudo systemctl -l status acpid [sudo] password for eugen: ● acpid.service - ACPI event daemon Loaded: loaded (/usr/lib/systemd/system/acpid.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2016-07-09 09:44:21 EEST; 7min ago Docs: man:acpid(8) Main PID: 615 (acpid) Tasks: 1 (limit: 512) CGroup: /system.slice/acpid.service └─615 /usr/bin/acpid --foreground --netlink

Jul 09 09:48:16 manjaro root[3492]: ACPI action undefined: PNP0C0A:00 Jul 09 09:48:50 manjaro root[3714]: ACPI action undefined: PNP0C0A:00 Jul 09 09:48:51 manjaro root[3721]: ACPI action undefined: PNP0C0A:00 Jul 09 09:49:53 manjaro root[4069]: ACPI action undefined: PNP0C0A:00 Jul 09 09:49:53 manjaro root[4071]: ACPI action undefined: PNP0C0A:00 Jul 09 09:49:57 manjaro root[4097]: ACPI action undefined: PNP0C0A:00 Jul 09 09:50:13 manjaro root[4189]: ACPI action undefined: PNP0C0A:00 Jul 09 09:50:23 manjaro root[4246]: ACPI action undefined: PNP0C0A:00 Jul 09 09:51:02 manjaro root[4477]: ACPI action undefined: PNP0C0A:00 Jul 09 09:51:04 manjaro root[4501]: ACPI action undefined: PNP0C0A:00

This is /var/log/xorg.8.log http://pastebin.com/gvkVTSM8

FadeMind commented 8 years ago
(...)
[    55.119] (--) PCI:*(0:4:0:0) 10de:1341:1025:0886 rev 162, Mem @ 0xc3000000/16777216, 0xb0000000/268435456, 0xc0000000/33554432, I/O @ 0x00003000/128
[    55.119] (II) Open ACPI successful (/var/run/acpid.socket)
[    55.119] (II) LoadModule: "glx"
[    55.119] (II) Loading /usr/lib/nvidia/xorg/modules/extensions/libglx.so
(..)

THIS should fixing your issues. Reboot and test how bumblebee works.

eugenbalintescu commented 8 years ago

I already did that. I still get:

[eugen@manjaro ~]$ optirun glxgears 3688 frames in 5.0 seconds = 737.511 FPS 3734 frames in 5.0 seconds = 746.746 FPS 3688 frames in 5.0 seconds = 737.461 FPS 3691 frames in 5.0 seconds = 738.142 FPS 3690 frames in 5.0 seconds = 737.930 FPS [VGL] ERROR: in readback-- [VGL] 254: Window has been deleted by window manager [eugen@manjaro ~]$ cat /proc/acpi/bbswitch 0000:04:00.0 ON

Nvidia card stays on after I quit glxgears.

FadeMind commented 8 years ago

FYI Card is OFF after closing terminal window, not when command exit 0.

eugenbalintescu commented 8 years ago

You're right, thanks.

bluca commented 8 years ago

@FadeMind that's great to know, we've backported those changes in Debian and Ubuntu too. Thanks for you work!

@eugenbalintescu is everything working ok now after fixing acpid?

eugenbalintescu commented 8 years ago

I am sorry but no. It didn't change anything. I tried to explain earlier.

Please look here: http://pastebin.com/vhdfSUMm I start optirun glxgears right after reboot and it works. I close it and start again, it doesn't work anymore. And acpid.service is started.

FadeMind commented 8 years ago

Did you closing terminal window for turn off iGPU? Paste

lsmod

and

dmesg 

lastest lines (begin on turn on nvidia GPU).

eugenbalintescu commented 8 years ago

I use yakuake but I used the quit button. Still the same.

lsmod: http://pastebin.com/EY82Rs4k dmesg: http://pastebin.com/XaMp1mNB

FadeMind commented 8 years ago

So, this is segfault and memory leak due Xorg + bbswitch + kernel issue...

[ 9960.349879] INFO: task Xorg:22535 blocked for more than 120 seconds.
[ 9960.349883]       Tainted: P           O    4.4.14-3-MANJARO #1
[ 9960.349884] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9960.349885] Xorg            D ffff880254257af8     0 22535    484 0x00400004
[ 9960.349889]  ffff880254257af8 00ffffffa1074701 ffff880255262940 ffff88007c5ee040
[ 9960.349892]  ffff880254258000 ffff880254257cb0 ffff880254257ca8 0000000000000000
[ 9960.349894]  ffff88007c5ee040 ffff880254257b10 ffffffff815a2cac 7fffffffffffffff
[ 9960.349896] Call Trace:
[ 9960.349901]  [<ffffffff815a2cac>] schedule+0x3c/0x90
[ 9960.349904]  [<ffffffff815a56f6>] schedule_timeout+0x1d6/0x260
[ 9960.349964]  [<ffffffffa0dc4d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[ 9960.350014]  [<ffffffffa0dc4d92>] ? os_acquire_spinlock+0x12/0x20 [nvidia]
[ 9960.350017]  [<ffffffff815a3821>] wait_for_common+0xc1/0x180
[ 9960.350020]  [<ffffffff810a1110>] ? wake_up_q+0x70/0x70
[ 9960.350022]  [<ffffffff815a38fd>] wait_for_completion+0x1d/0x20
[ 9960.350024]  [<ffffffff8108d002>] flush_workqueue+0x132/0x5e0
[ 9960.350101]  [<ffffffffa129588b>] ? _nv014870rm+0x1b/0x40 [nvidia]
[ 9960.350150]  [<ffffffffa0dc4bde>] os_flush_work_queue+0x4e/0x60 [nvidia]
[ 9960.350229]  [<ffffffffa12e7c77>] rm_disable_adapter+0x77/0x130 [nvidia]
[ 9960.350231]  [<ffffffff810bf400>] ? up+0x10/0x50
[ 9960.350278]  [<ffffffffa0dc8023>] ? nv_uvm_notify_stop_device+0x63/0x80 [nvidia]
[ 9960.350323]  [<ffffffffa0db978d>] ? nv_close_device+0xed/0x130 [nvidia]
[ 9960.350369]  [<ffffffffa0dbb9b4>] ? nvidia_close+0xd4/0x2c0 [nvidia]
[ 9960.350416]  [<ffffffffa0db939c>] ? nvidia_frontend_close+0x2c/0x50 [nvidia]
[ 9960.350418]  [<ffffffff811e52dc>] ? __fput+0x9c/0x1f0
[ 9960.350420]  [<ffffffff811e549e>] ? ____fput+0xe/0x10
[ 9960.350422]  [<ffffffff81093b74>] ? task_work_run+0x84/0xa0
[ 9960.350424]  [<ffffffff8100369a>] ? exit_to_usermode_loop+0xba/0xc0
[ 9960.350426]  [<ffffffff81003bde>] ? syscall_return_slowpath+0x4e/0x60
[ 9960.350429]  [<ffffffff815a68c8>] ? int_ret_from_sys_call+0x25/0x8f

BUT it is weird.

@bluca CC

FadeMind commented 8 years ago

OK, now boot for GOOD working kernel and paste full dmesg and other usefull logs (Xorg.8.log, lsmod, primusrun, optirun etc) for compare. Thanks.

eugenbalintescu commented 8 years ago

I booted kernel 4.1.

optirun: http://pastebin.com/LS0sfi2n primusrun: http://pastebin.com/Pp3MnzSv dmesg: http://pastebin.com/xmR6ibdd Xorg.8.log: http://pastebin.com/U7uEgtWC lsmod: http://pastebin.com/7qbaMync

eugenbalintescu commented 8 years ago

I found this on the arch forum: https://bbs.archlinux.org/viewtopic.php?id=211359 Could this be a similar problem?

bluca commented 8 years ago

No we use libkmod now, there's no direct reference to the individual kernel modules in the code.

The issue looks like to be a kernel oops when trying to unload the modules. Given the only variable is the kernel version, I'm quite confident is has nothing to do with bumblebee. Since this problem does not happen on Debian/Ubuntu with the same kernel versions, my guess would be in the direction of a problem between bbswitch and that kernel version.

Again I'm not familiar with Manjaro. Are the kernel modules (both nvidia and bbswitch) built by DKMS?

Lekensteyn commented 8 years ago

Please try to isolate the issue.

(Eugen reported this problem at https://bugzilla.kernel.org/show_bug.cgi?id=121691, but since that was reported with the proprietary driver there is nothing to do there.)

eugenbalintescu commented 8 years ago

PMMethod=none doesn't change anything.

I installed nouveau and did what you asked. Here's the output of dmesg: http://pastebin.com/9vNDZVnr

Lekensteyn commented 8 years ago

Are you sure? Setting PMMethod=none in /etc/bumblebee/bumblebee.conf doesn't do anything?

nouveau seems blacklisted, can you load it explicitly with sudo modprobe nouveau and repeat the tests?

eugenbalintescu commented 8 years ago

Yes, I am sure, I checked again with the same results. Setting PMMethod=none doesn't change anything. I start a program with primusrun, it works. I close it and start it again, it doesn't work anymore untill I reboot.

I also did the nouveau test again, this time with sudo modprobe nouveau. This is dmesg: http://pastebin.com/zTKg6mL3

Lekensteyn commented 8 years ago

@eugenbalintescu The 4.4 kernel included with Manjaro is a bit old and its included nouveau module does not support your GPU. Can you try a newer kernel (with nouveau, without Bumblebee or bbswitch)?

And when you try PMMethod=none, can you include the dmesg?

eugenbalintescu commented 8 years ago

I did the tests again. I used kernel 4.6 and set PMMethod=none. Nothing changed, here's dmesg: http://pastebin.com/Cv4eCW0f

I also installed nouveau on kernel 4.6 and ran lspci several times. Here's dmesg: http://pastebin.com/LhXe9GwW

Lekensteyn commented 8 years ago

@eugenbalintescu Please validate your results yourself first, if you check your second dmesg | grep nouveau, then you will see that nouveau is not loaded. It is likely blacklisted and you have to run sudo modprobe nouveau to load it first.

In your first test with PMMethod=none, I see no scary oopses/warnings as before. What is exactly not working here?

eugenbalintescu commented 8 years ago

Sorry, I am nothing but a Linux desktop user. These things are not very familiar to me and I forgot about modprobe nouveau.

I did the test again, here's dmesg: http://pastebin.com/MGBjH3wT I hope I got it right ths time.

Regarding the PMMethod=none test, I have the same problem: I start a game with primusrun, it works. I close it after a few minutes and start it again, it doesn't work anymore unless I reboot the laptop.

Lekensteyn commented 8 years ago

Ahh, your kernel is too old for your hardware, I think you need 4.6... Anyway, let's skip that test for now.

Everything looks fine according to the dmesg you provided four days ago. Another thing you can try (if you just care about using primusrun) is to set KeepUnusedXServer=true in /etc/bumblebee/bumblebee.conf

eugenbalintescu commented 8 years ago

What do you mean? I did the last tests with kernel 4.6. Is that too old? I don't care about primusrun, I care about using all my hardware with Linux and bumblebee is the way to do that for now. Someone else on the Manjaro forum has the exact problem as I do and this is what he had to say about KeepUnusedXServer=true: https://forum.manjaro.org/t/bumblebee-not-switching-back-to-igpu-after-quitting-optirun/1054/19?u=eugen I agree with him. I was hoping for a better fix. In the meantime, I have to stick with kernel 4.1. :(

Lekensteyn commented 8 years ago

Oops, I was mixing up log files. Upon further inspection you need at least 4.7 for support for the GM108 chipset in nouveau. If not too much asked, could you try 4.7 (tomorrow should be the final release, but you can already try a -rc kernel).

KeepUnusedXServer is bad for battery and increases the temperature so it is indeed not great, but might be preferable in some cases where you have such issues.

eugenbalintescu commented 8 years ago

Sorry for the late reply. I did the test again with kernel 4.7. Here's dmesg: http://pastebin.com/3fasFxqE

kittkott commented 8 years ago

test with kernel 4.7. and Gentoo Linux 4.7.0-gentoo #7 SMP PREEMPT x86_64 Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz GenuineIntel GNU/Linux

Never OFF cat /proc/acpi/bbswitch 0000:01:00.0 ON

[ 9.188658] bbswitch: version 0.8 [ 9.188665] bbswitch: Found integrated VGA device 0000:00:02.0: SB.PCI0.VID_ [ 9.188669] bbswitch: Found discrete VGA device 0000:01:00.0: SB.PCI0.PEG.VID [ 9.189003] bbswitch: detected an Optimus _DSM function [ 9.189011] bbswitch: device 0000:01:00.0 is in use by driver 'nouveau', refusing OFF [ 9.189014] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on

I Need fix it to turn off the discrete VGA.

Lekensteyn commented 8 years ago

@kittkott your issue is different, also pay attention to: device 0000:01:00.0 is in use by driver 'nouveau', refusing OFF

You can open a new report if you want.

kittkott commented 8 years ago

@Lekensteyn When i made 'nouveau' like a module to unload it bbswitch report: device 0000:01:00.0 is in use by driver 'nvidiafb', refusing OFF nvidiafb is also compiled lake a module

kittkott commented 8 years ago

lsmod | grep nouveau nouveau 1411114 0 ttm 68922 1 nouveau

When 'nouveau' like a module, I can't use discrete VGA. optirun glxspheres64 [ 272.052935] [ERROR]Cannot access secondary GPU - error: XORG /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied [ 272.052980] [ERROR]Aborting because fallback start is disabled.

dmesg | grep bbswitch [ 8.674759] bbswitch: version 0.8 [ 8.674767] bbswitch: Found integrated VGA device 0000:00:02.0: SB.PCI0.VID_ [ 8.674772] bbswitch: Found discrete VGA device 0000:01:00.0: SB.PCI0.PEG.VID [ 8.675192] bbswitch: detected an Optimus _DSM function [ 8.675203] bbswitch: device 0000:01:00.0 is in use by driver 'nvidiafb', refusing OFF [ 8.675206] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on [ 8.686372] bbswitch: disabling discrete graphics

cat /proc/acpi/bbswitch 0000:01:00.0 ON

Lekensteyn commented 8 years ago

@kittkott Please RTFM your distro's (Gentoo) documentation, you have misconfigured something. Please do not hijack bugs, create a new one.

eugenbalintescu commented 7 years ago

Hello again, It's been 4 months since I reported my problem here and there is still no change. I am stuck with kernel 4.1 and none of the newer ones seem to work. Is there any progress fixing this? Thanks

Lekensteyn commented 7 years ago

@eugenbalintescu Didn't really have the time yet to check issues here. From your previous descriptions it was not clear which component (bbswitch, nouveau, nvidia, kernel core) is problematic. Can you give a summary of the components that you are using? In particular:

paulbendixen commented 7 years ago

@eugenbalintescu This sounds a bit like what is happening with my Kubuntu build, My workaround is to do rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia tee /proc/acpi/bbswitch <<< OFF

Both as root. I still haven't found a viable solution, but it does allow me to turn off the GPU and start using primusrun again.