Bumblebee-Project / Bumblebee

Bumblebee daemon and client rewritten in C
http://www.bumblebee-project.org/
GNU General Public License v3.0
1.29k stars 142 forks source link

Bumblebee only successful one time on kernel 4.0.0-rc1 #638

Closed pyamsoft closed 9 years ago

pyamsoft commented 9 years ago

I am running on an ArchLinux 64 bit system on the newly released kernel 4.0.0-rc1 and the proprietary nvidia driver. I have not enabled any testing repositories and have built both the bbswitch and nvidia kernel modules from the AUR using their respecitve "dkms" labeled builds, and am using the bumblebee binary from the official repositories. The dkms modules both compiled file, and even successfully run the first time that a command is invoked:

optirun glxgears

After stopping the command with Ctrl+C, attempting to run the same command again displays an error log:

[  404.919521] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf
[  404.920061] [DEBUG]optirun version 3.2.1 starting...
[  404.920094] [DEBUG]Active configuration:
[  404.920103] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf
[  404.920112] [DEBUG] X display: :8
[  404.920121] [DEBUG] LD_LIBRARY_PATH: /usr/lib/nvidia:/usr/lib32/nvidia
[  404.920130] [DEBUG] Socket path: /var/run/bumblebee.socket
[  404.920138] [DEBUG] Accel/display bridge: primus
[  404.920147] [DEBUG] VGL Compression: proxy
[  404.920155] [DEBUG] VGLrun extra options: 
[  404.920164] [DEBUG] Primus LD Path: /usr/lib/primus:/usr/lib32/primus
[  405.425518] [INFO]Response: No - error: Could not load GPU driver

[  405.425578] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver

[  405.425595] [DEBUG]Socket closed.
[  405.425643] [ERROR]Aborting because fallback start is disabled.
[  405.425674] [DEBUG]Killing all remaining processes.

Bumblebee fails and leaves the dedicated nvidia graphics on as a result.

Bumblebee worked perfectly fine on kernel 3.19.1, no changes were made to any nvidia or bbswitch files inbetween kernel upgrades, aside from a small patch which was needed to get the nvidia module to run and can be found here:

https://devtalk.nvidia.com/default/topic/813458/linux/linux-4-0-rc1-346-35-build-error-_cr4-functions-fix/

I do not know if the patch suggested from the link above has affected anything having to do with the nvidia driver, but without the change, the nvidia module would not build on the new kernel 4.0.

I am able to restart the bumblebeed.service using systemctl and this successfully turns off the dedicated nvidia graphics, however attempting to run the optirun command from above once again fails with the same GPU error. I cannot really say if this is an issue with the nvidia code or bumblebee itself, but figured I would leave this in order to make the devs aware.

ArchangeGabriel commented 9 years ago

When you say only successful one time, does it mean it works for the first attempt but not after? Could you provide dmesg log from boot (up to an unsuccessful attempt followed by bumblebeed restart)?

pyamsoft commented 9 years ago

Yes that is precisely what I mean.

Would you prefer a pastebin log or should I simply include dmesg output here in the issue?

pyamsoft commented 9 years ago

Dmesg log for 3.19 (successful) http://pastebin.com/Kt5ubZka

Dmesg log for 4.0 (unsuccessful) http://pastebin.com/EVLcdvfk

These lines at the end of the file seem interesting, they were not present in 3.19

[   94.726709] bbswitch: enabling discrete graphics
[   95.163623] NVRM: Can't find an IRQ for your NVIDIA card!
[   95.163629] NVRM: Please check your BIOS settings.
[   95.163631] NVRM: [Plug & Play OS] should be set to NO
[   95.163633] NVRM: [Assign IRQ to VGA] should be set to YES 
[   95.163641] nvidia: probe of 0000:01:00.0 failed with error -1

Does this seem like an nvidia driver issue then? Or is bumblebee not properly unloading modules?

ArchangeGabriel commented 9 years ago

I think this is indeed a nvidia driver issue, but it might be bbswitch not handling well the new kernel.

Silarn commented 9 years ago

Also experiencing this. Initially it wouldn't even compile successfully on Fedora, but after applying a patch noted on the nvidia forum for 3.20/4.0, it does compile. However, I'm pretty sure I was experiencing a similar issue on a 3.19 kernel shortly before the next kernel development started - but it only started after moving from Fedora 21 to the current branched Fedora 22.

In the process of patching the nvidia driver for 4.0.0, while the first install was successful I had not properly referenced a file in the patch for the build system (didn't have it under /kernel/), so I fixed the patch and rebuilt. However, in this case it fails complaining that a module is already loaded. (It was nvsomething, let me force a rebuild to see what happens.)

Building NVIDIA video drivers: Creating directory NVIDIA-Linux-x86_64-346.47
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 346.47...........................................................................................................................................................................................................................................................
patching file kernel/nv-drm.c
Hunk #1 succeeded at 22 with fuzz 2 (offset 4 lines).
patching file kernel/nv-drm.c
Hunk #1 succeeded at 136 with fuzz 1 (offset 7 lines).
patching file kernel/nv-pat.c

WARNING: You do not appear to have an NVIDIA GPU supported by the 346.47 NVIDIA Linux graphics driver installed in this system.  For
         further details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in the README available on the Linux driver
         download page at www.nvidia.com.

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the
       wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target
       kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining
       ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux
       graphics driver release.

       Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file
       '/var/log/nvidia-installer.log' for more information.

ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on
       fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

tail of nvidia-installer.log:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 1428.836644]  [<ffffffffa08eb2c3>] nvidia_init_module+0x2c3/0x798 [nvidia]
[ 1428.836659]  [<ffffffffa08eb7ad>] ? nv_drm_init+0x15/0x15 [nvidia]
[ 1428.836674]  [<ffffffffa08eb834>] nvidia_frontend_init_module+0x87/0x853 [nvidia]
[ 1428.836676]  [<ffffffff81002128>] do_one_initcall+0xb8/0x200
[ 1428.836679]  [<ffffffff811e3bc2>] ? __vunmap+0xa2/0x100
[ 1428.836681]  [<ffffffff81201cb9>] ? kmem_cache_alloc_trace+0x1b9/0x240
[ 1428.836683]  [<ffffffff8177edc8>] ? do_init_module+0x28/0x1cb
[ 1428.836684]  [<ffffffff8177ee00>] do_init_module+0x60/0x1cb
[ 1428.836687]  [<ffffffff81121cfe>] load_module+0x203e/0x2690
[ 1428.836688]  [<ffffffff8111d780>] ? store_uevent+0x70/0x70
[ 1428.836691]  [<ffffffff81226480>] ? kernel_read+0x50/0x80
[ 1428.836693]  [<ffffffff8112254e>] SyS_finit_module+0xbe/0xf0
[ 1428.836695]  [<ffffffff81785f29>] system_call_fastpath+0x12/0x17
[ 1428.836696] ---[ end trace f4d506b64a327024 ]---
[ 1428.836697] NVRM: failed to register procfs!
[ 1428.836717] NVRM: Can't find an IRQ for your NVIDIA card!
[ 1428.836718] NVRM: Please check your BIOS settings.
[ 1428.836718] NVRM: [Plug & Play OS] should be set to NO
[ 1428.836719] NVRM: [Assign IRQ to VGA] should be set to YES 
[ 1428.836721] nvidia: probe of 0000:01:00.0 failed with error -1
[ 1428.836748] Error: Driver 'nvlink' is already registered, aborting...
[ 1428.837079] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1428.837081] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 1428.837082] [drm] Module unloaded
[ 1428.837183] NVRM: NVIDIA init module failed!

Is the [ 1428.836748] Error: Driver 'nvlink' is already registered, aborting... normal?

pyamsoft commented 9 years ago

These lines particularly suggest that something may have changed perhaps with the way that the kernel is handling the loading and unloading of the nvidia module, as the patches applied to build the NVidia driver change a mere 3 or 4 lines, and the Bumblebee code has not changed, so the only source of major change comes from the new kernel.

[ 1428.836717] NVRM: Can't find an IRQ for your NVIDIA card!
[ 1428.836718] NVRM: Please check your BIOS settings.
[ 1428.836718] NVRM: [Plug & Play OS] should be set to NO
[ 1428.836719] NVRM: [Assign IRQ to VGA] should be set to YES 
[ 1428.836721] nvidia: probe of 0000:01:00.0 failed with error -1

On my system I do not get these lines running kernel 3.19, infact, on kernel 3.19 Bumblebee runs perfectly fine (using both the primus and vgl backends). But that still leaves the question about whether fixing this issue is up to the NVidia developers or the Bumblebee developers, which as of this point, is still too early to truly decide.

The 346.47 driver recently became available in ArchLinux, however the issue is still present. We may have to wait for kernel 4 to "officially" release before NVidia will be able to properly support it. At that time, should this issue persist, perhaps we can then decide if it is an issue with Bumblebee itself or not.

ArchangeGabriel commented 9 years ago

I think the issue is on nvidia side, because of changes in the kernel. Does this also happens for system without Optimus but only nvidia card?

Silarn commented 9 years ago

I don't think it rears its head unless you try to disable and reenable the GPU. So for single GPU systems, there may not be a problem as the card is always on (unless you try to hot-update the GPU driver). For optimus systems where it is switched on and off, however, it doesn't seem to be able to 'reacquire' the GPU after it has been disabled the first time.

ArchangeGabriel commented 9 years ago

OK, I see. Let’s wait for 4.0 release and a new NVIDIA driver release.

Impulse2000 commented 9 years ago

Sorry for my english. I am Russian Installed Linux Mint 17.1 Rebecca x86_64 on ASUS X55VD + NVIDIA drivers 346.47 from off site.

System: Host: impulse-X55VD Kernel: 4.0.0-rc3-custom x86_64 (64 bit gcc: 4.8.2) Desktop: MATE 1.8.1 (Gtk 3.14.9-0ubuntu1~14.04~ricotz0) Distro: Linux Mint 17.1 Rebecca Graphics: Card-1: Intel 2nd Generation Core Processor Family Integrated Graphics Controller bus-ID: 00:02.0 Card-2: NVIDIA GF119M [GeForce 610M] bus-ID: 01:00.0 Display Server: X.Org 1.15.1 drivers: fbdev,intel,nouveau (unloaded: nvidia,vesa) Resolution: 1366x768@76.0hz GLX Renderer: Gallium 0.4 on llvmpipe (LLVM 3.6, 128 bits) GLX Version: 3.0 Mesa 10.6.0-devel Direct Rendering: Yes

Problem exist.

pyamsoft commented 9 years ago

Some updates on the state of this bug:

With the release of kernel 4.0 rc5, this issue seems to have been resolved. I am now able to launch glxgears more than one time using the dedicated NVidia card on my system via the following command:

optirun glxgears

Primus as a backend also seems to work multiple times when using

optirun -b primus glxgears 

or

primusrun glxgears

Can anybody else comfirm that Bumblebee works multiple times when using kernel 4.0 rc5 or higher?

Impulse2000 commented 9 years ago

Make downgrade to: System: Host: impulse-X55VD Kernel: 3.18.9-031809-generic x86_64 (64 bit gcc: 4.6.3) Desktop: MATE 1.8.1 (Gtk 2.24.24) Distro: Linux Mint 17.1 Rebecca Graphics: Card-1: Intel 2nd Generation Core Processor Family Integrated Graphics Controller bus-ID: 00:02.0 Card-2: NVIDIA GF119M [GeForce 610M] bus-ID: 01:00.0 Display Server: X.Org 1.15.1 drivers: intel (unloaded: fbdev,vesa) FAILED: nouveau Resolution: 1366x768@60.0hz GLX Renderer: Mesa DRI Intel Sandybridge Mobile GLX Version: 3.0 Mesa 10.6.0-devel Direct Rendering: Yes

All working with primus/virtualGL/CUDA

pyamsoft commented 9 years ago

Yes, I understand that you may be successful using an earlier kernel version. A downgrade can solve the problem, but it is not the target solution in this case. The bug in question only affects kernel 4. As such, I would assume that potential solutions to the problem would involve the usage of kernel 4 family instead of an earlier kernel 3. Let's please try to keep potential solutions limited at least to working with kernel 4. Sorry for any confusion.

Silarn commented 9 years ago

@pyamsoft, yes the more recent kernel versions have solved the reactivation issue here also. However, I've also been having my dGPU light indicate the card is currently always on - and this has not been fixed. I haven't run tests to see if the power draw is indeed as if both the integrated and discreet cards were on, and this may be a separate issue if so.

pyamsoft commented 9 years ago

I have not encountered this issue here, I too have a light which shows the activity of my dedicated card however it is able to properly turn off, so I do not know if this issue directly relates. Is the 'light issue' present on earlier kernels as well or just the 4 series?

Lekensteyn commented 9 years ago

Using just bbswitch without the nvidia kernel still works for me (4.0-rc6). The power consumption raises when the card is turned on via bbswitch, and drops when disabled again. The light is also toggled properly.

Can you reproduce it without loading the nvidia driver?

# disable
sudo tee /proc/acpi/bbswitch <<<OFF
# enable
sudo tee /proc/acpi/bbswitch <<<ON
ArchangeGabriel commented 9 years ago

@Silarn Could you eventually open a new issue against bbswitch if it’s not working for you?

I’m closing this issue as it has been reported as resolved as of kernel 4.0rc5.

Silarn commented 9 years ago

Turns out the bbswitch install must have broken at some point, when I went to trigger it manually the bbswitch file was missing. Reinstalling the package and rebooting solved my problem. Thanks.