Bumblebee-Project / Bumblebee

Bumblebee daemon and client rewritten in C
http://www.bumblebee-project.org/
GNU General Public License v3.0
1.3k stars 142 forks source link

GT650M: Failed to initialize NVIDIA GPU #172

Closed jjmcdn closed 12 years ago

jjmcdn commented 12 years ago
baseboard-manufacturer: CLEVO CO.
baseboard-product-name: W110ER                          
baseboard-version     : N/A                             
system-manufacturer   : CLEVO CO.                       
system-product-name   : W110ER                          
system-version        : N/A                             
bios-vendor           : American Megatrends Inc.
bios-version          : 4.6.5
bios-release-date     : 04/26/2012

The "Failed to initialize NVIDIA GPU" error appears in syslog from bumblebeed / Xorg.8 whenever I try to use optirun on anything.  It doesn't appear to be a problem with acpi options on my command line:

% cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-3.2.0-24-generic root=UUID=9b0ea8db-ffb8-451e-a38b-485142cd15dc ro quiet splash vt.handoff=7

I am not seeing an error of the form "Error inserting nvidia_current..." but this command:

lspci -d 10de: -vvnn

Definitely produces the expected error:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:0fd1] (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: nvidia

Not really sure what to do next, I don't know which scripts I'm looking for to enable my card described in the troubleshooting section. I'll mail the logs gathered by bumblebee-bugreport to the mailing list now.

amonakov commented 12 years ago

No, what the hotfix-3.0.1 branch does is switching only optirun.c to a blocking socket, which fixes the issue at hand and does not restrict users to one optirun instance. It's ready to be used by all users.

Switching to blocking sockets everywhere can be done at any later point.

ArchangeGabriel commented 12 years ago

Should now be fixed in 3.0.1.

Arch package being currently built, Ubuntu ones will probably take a little more time.

jensanjo commented 12 years ago

I reinstalled bumblebee from git, and confirm that the issue is fixed for me. Great work!

ArchangeGabriel commented 12 years ago

NIce to hear, then just waiting for the other joey to confirm.

yimm commented 12 years ago

This error (https://github.com/Bumblebee-Project/Bumblebee/issues/172#issuecomment-6430826) is fixed too. Bumblebee 3.0.1 + nvidia 304.22 beta Thanks to the team.

throgh commented 12 years ago

Hello together!

I've just found this report, because the problem also occurs on my new laptop (using also the GeForce GT650M. Therefore a big "THANKS" for this information. But I've got also the question: How could I install bumblebee from GIT? Sorry for this dumb question from side, I'm new to this and trying to find my way into the depth of my Ubuntu Linux 12.04 x64.

And the last question: Is there also a stable package for installation? Because the last look took me the version number 3.0.1, but the error wasn't gone. Thanks!

ArchangeGabriel commented 12 years ago

3.0.1 and git are currently the same. Could you open a new issue and provide all usefull informations ?

throgh commented 12 years ago

Okay, I'm going to post more information tomorrow / later the upcoming day in a new ticket. The basic error message is just as other members reported when trying to startup "optirun". So I've installed Ubuntu 12.04 again and give this another try.

Big thanks for this great toolset and to the team behind "bumblebee"!

jjmcdn commented 12 years ago

Woo! Took a little bit of doing to unwind some of the changes I'd made to my config in the name of debugging, but I can confirm bumblebee is now working properly for me as well on my machine with 3.0.1 and the 304.22 NVidia drivers on my 3.5.0-6 kernel from the xorg-edgers PPA.

Life's good, thanks!

studentz commented 12 years ago

Move to xorg-edgers, bumbleebee work nicely. I only change my conf in the xorg.conf.nvidia file ( Option "UseDisplayDevice" "none"). Thanks for the time in the project.

On Fri, Jul 27, 2012 at 9:30 PM, Joe MacDonald < reply@reply.github.com

wrote:

Woo! Took a little bit of doing to unwind some of the changes I'd made to my config in the name of debugging, but I can confirm bumblebee is now working properly for me as well on my machine with 3.0.1 and the 304.22 NVidia drivers on my 3.5.0-6 kernel from the xorg-edgers PPA.

Life's good, thanks!


Reply to this email directly or view it on GitHub:

https://github.com/Bumblebee-Project/Bumblebee/issues/172#issuecomment-7329546

Witos commented 11 years ago

I've got gentoo-3.3.8, GT 650M, bumblebee 3.0.1 from portage, andOption "UseDisplayDevice" set. I still got above error. Any clues?

optirun -vv glxspheres [ 553.841462] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf [ 553.841637] [INFO]Configured driver: nvidia [ 553.955200] [DEBUG]optirun version 3.0.1 starting... [ 553.955213] [DEBUG]Active configuration: [ 553.955229] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf [ 553.955235] [DEBUG] X display: :8 [ 553.955246] [DEBUG] LD_LIBRARY_PATH: /usr/lib64/opengl/nvidia/lib:/usr/lib32/opengl/nvidia/lib:/usr/lib/opengl/nvidia/lib [ 553.955253] [DEBUG] Socket path: /var/run/bumblebee.socket [ 553.955259] [DEBUG] VGL Compression: proxy [ 554.070160] [INFO]Response: No - error: Could not enable discrete graphics card

[ 554.070184] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[ 554.070197] [DEBUG]Socket closed. [ 554.070211] [ERROR]Aborting because fallback start is disabled. [ 554.070219] [DEBUG]Killing all remaining processes.

systemlog:

Sep 19 16:40:13 witos-linux kernel: bbswitch: enabling discrete graphics Sep 19 16:40:13 witos-linux kernel: nvidia 0000:01:00.0: power state changed by ACPI to D0 Sep 19 16:40:13 witos-linux kernel: nvidia 0000:01:00.0: Refused to change power state, currently in D3 Sep 19 16:40:13 witos-linux kernel: nvidia 0000:01:00.0: power state changed by ACPI to D0 Sep 19 16:40:13 witos-linux bumblebeed[14981]: Could not enable discrete graphics card Sep 19 16:41:31 witos-linux kernel: NVRM: RmInitAdapter failed! (0x23:0x2f:675) Sep 19 16:41:31 witos-linux kernel: NVRM: rm_init_adapter(0) failed Sep 19 16:44:54 witos-linux kernel: bbswitch: enabling discrete graphics Sep 19 16:44:54 witos-linux kernel: nvidia 0000:01:00.0: power state changed by ACPI to D0 Sep 19 16:44:54 witos-linux kernel: nvidia 0000:01:00.0: Refused to change power state, currently in D3 Sep 19 16:44:54 witos-linux kernel: nvidia 0000:01:00.0: power state changed by ACPI to D0 Sep 19 16:44:54 witos-linux bumblebeed[14981]: Could not enable discrete graphics card

amonakov commented 11 years ago

What if you use it like this:

sudo tee /proc/acpi/bbswitch <<<ON; sudo nvidia-xconfig -query-gpu-info; optirun glxspheres
Witos commented 11 years ago

ON NVIDIA: could not open the device file /dev/nvidia0 (Input/output error).

WARNING: Unable to use the nvidia-cfg library to query NVIDIA hardware.

ERROR: Unable to query GPU information

[ 1494.620166] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[ 1494.620201] [ERROR]Aborting because fallback start is disabled.

My nvidia-drivers - 304.43

amonakov commented 11 years ago

Do you have CONFIG_NO_HZ and CONFIG_RCU_FAST_NO_HZ enabled in kernel config? See the latest post of this thread: http://www.nvnews.net/vbulletin/showthread.php?t=191780

Witos commented 11 years ago

I didn't have second one, but after enabling, compiliing and installing kernel, reinstalling nvidia-drivers, bumblebee and bbswitch and rebooting - no progress :(

godlike64 commented 11 years ago

Witos, try to narrow down the problem by starting at the lowest level possible . Make sure neither bbswitch, bumblebee, nor nvidia module is loaded upon reboot (for nvidia, you might have to rm /lib/udev/nvidia-udev.sh or something like that, an ugly udev rule that might be doing more harm than good here). After a clean reboot, modprobe nvidia module and try to run one of the most basic programs which access the card (nvidia-xconfig, nvidia-smi, or one of the simple CUDA programs from the SDK. The first two come bundled with the drivers so might be your best chance). If, after running that, you get "GPU has fallen off the bus" then you're in for the same problem as me (or related).

I'm the one who came up with those two kernel options, after endless hours of debugging last night. What I ended up doing was, since I knew that on Ubuntu on my same laptop (Thinkpad W530 with a Quadro K1000M) it had worked under Ubuntu, I booted with a liveiso of 12.04 amd64 (to make sure it worked on 64 bits) and tested it. Of course it worked, so next step was to download ubuntu-sources on my Gentoo installation, use the config from Ubuntu's liveiso, compile it and test it (the ubuntu-sources because there may be some patches that are neither in gentoo-sources nor on vanilla-sources. The config because, if it wasn't the patches, it might be a .config option).

Booting from the Ubuntu kernel inside my Gentoo box worked, so by now I had made sure that neither the hard or the OS was the problem, but the kernel. What was left was diff the two .config files side by side and start testing. I got lucky since it worked by the time I got to the tickless part.

You might want to play around enabling more things under the RCU subsystem (I remember setting one of the values there from 32 to 64, can't recall what's it named but it's the only numerical value you can touch there), enabling the rest, and I THINK that I also enabled everything IOMMU related (the settings are spread out in two or three places).

Anyway, sorry for the long post. If you need further help, I'm at #bumblebee on Freenode. Good luck!

godlike64 commented 11 years ago

I just did some more testing, I can confirm that the previous two settings plus enabling CONFIG_CALGARY_IOMMU and CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT makes the card work. If IOMMU config options are missing, I don't get "has fallen off the bus" but nevertheless the card "breaks".

Witos commented 11 years ago

Thank you guys, I'm really grateful for your help. Re-emerging of xorg-server did the trick. It works now, optirun start glxspheres with big boost. Dell 7720 17R, with GT650M on board is supported by bumblebee!

Witos commented 11 years ago

Well, it doesn't work again after reboot, although I've change nothing. Strange... Before the successful run it booted windows, checked nvidia settings, and rebooted to linux, maybe it was the cause it run. @godlike64 could you please send me your .config so I can diff it with my own (snajper[at]o2.pl)?

godlike64 commented 11 years ago

Sure, here it is: http://bpaste.net/show/46582/

Note that yesterday after my first post we discovered with amonakov that IOMMU settings in the kernel were relevant too.

Witos commented 11 years ago

Thanks. I had thos IOMMU already enabled. It's hard to compare the .config, since my kernel has been made by genkernel with some additions from gentoo-wiki. I should make a new kernel by hand and then check the differences, not much time for that unfortunately. Strange thing is I've re-emerged the xorg-server and bbswitch again and it worked again for a boot, then after reboot it stopped working and re-emerging didn't work again. Maybe there is a race condition during boot or maybe xorg-server files are written by some process... If I find out sth I let you know.

godlike64 commented 11 years ago

Witos, have you made sure that nvidia module does NOT get loaded upon reboot?

Witos commented 11 years ago

I tried that, I removed it from /etc/conf.d/modules, removed bumblebee from rc-update, but it gets loaded anyway and I don't know how yet.

Lekensteyn commented 11 years ago

Pass the modprobe.blacklist=nvidia option or create an /etc/modprobe.d/(whatever).conf file containing blacklist nvidia.

godlike64 commented 11 years ago

Witos, on Gentoo an udev rule is installed in /lib/udev/nvidia-udev.sh (I can't recall the name right now, but it's definitely under /lib/udev). In my case that was messing things up. It is safe to remove it (or move it to another directory if you wish to keep it). The file is reinstalled on every emerge of nvidia-drivers. Try to remove that file and reboot.

Witos commented 11 years ago

Hi again, This is what I did - got brand new 3.4.9 gentoo-sources and made a new kernel with defconfig. I mimiced @godlike64 config in terms of Graphics Drivers (ie. I didn't have the VGA_SWITCHEROO) and turned on configs that @godlike64 suggested. I removed all x11 packages, bumblebee, and drivers and reinstalled again. After that optirun keep working every time after 3 reboots, thanks guys!

godlike64 commented 11 years ago

Glad to hear that! Could you upload your final kernel config?

Witos commented 11 years ago

Sure: http://bpaste.net/show/46880/ , dell 17R 7720, bbswitch 0.4.2, nvidia-drivers 304.48, bumblebee 3.0.1

babau commented 11 years ago

same issue here I'm on gentoo kernel 3.6.2 all the modules that have been suggested in this thread are enabled

nvidia drivers 304.51

but

optirun -vv glxsphere [ 408.623778] [DEBUG]Reading file: /etc/bumblebee/bumblebee.conf [ 408.624114] [INFO]Configured driver: nvidia [ 408.804918] [DEBUG]optirun version 3.0.1 starting... [ 408.804957] [DEBUG]Active configuration: [ 408.804965] [DEBUG] bumblebeed config file: /etc/bumblebee/bumblebee.conf [ 408.804971] [DEBUG] X display: :8 [ 408.804977] [DEBUG] LD_LIBRARY_PATH: /usr/lib64/opengl/nvidia/lib:/usr/lib32/opengl/nvidia/lib:/usr/lib/opengl/nvidia/lib [ 408.804995] [DEBUG] Socket path: /var/run/bumblebee.socket [ 408.805001] [DEBUG] VGL Compression: jpeg [ 408.916935] [INFO]Response: No - error: Could not enable discrete graphics card

[ 408.916973] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

[ 408.916986] [DEBUG]Socket closed. [ 408.917014] [ERROR]Aborting because fallback start is disabled. [ 408.917024] [DEBUG]Killing all remaining processes.

kern.log

[ 25.815658] bbswitch: enabling discrete graphics [ 26.050878] pci 0000:01:00.0: power state changed by ACPI to D0 [ 26.050909] thinkpad_acpi: EC reports that Thermal Table has changed [ 26.062638] pci 0000:01:00.0: Refused to change power state, currently in D3 [ 26.122819] pci 0000:01:00.0: power state changed by ACPI to D0 [ 26.136003] pci 0000:01:00.0: Refused to change power state, currently in D3 [ 28.238425] bbswitch: enabling discrete graphics [ 28.238446] pci 0000:01:00.0: power state changed by ACPI to D0 [ 28.248664] pci 0000:01:00.0: Refused to change power state, currently in D3 [ 28.248687] pci 0000:01:00.0: power state changed by ACPI to D0 [ 28.261995] pci 0000:01:00.0: Refused to change power state, currently in D3 [ 408.899528] bbswitch: enabling discrete graphics [ 408.899546] pci 0000:01:00.0: power state changed by ACPI to D0 [ 408.911132] pci 0000:01:00.0: Refused to change power state, currently in D3

godlike64 commented 11 years ago

Have you ensured that nvidia driver does not get loaded on startup (see my first post on this thread)? If you do, try rebooting cleanly (making sure NVIDIA card is off and does not get turn on by things like, the nvidia driver module autoloading), and immediately after reboot (if you can disable X that helps, as an additional measure to make sure nothing tries to access the card) run nvidia-xconfig -query-gpu-info. In my case, when it was failing, that command took around 3 seconds to run before exiting with error, and then if you immediately look on dmesg, you will see the true error. Those ACPI errors you posted tend to happen when something tries to access the card and, for example, it is turned off or something else corrupted its state.

If you still see "has fallen off the bus" in dmesg even though you have enabled the options I mentioned before, there coulld be some other option influencing this that I have not noticed before. I can give you the ubuntu kernel config I used, and installing ubuntu-sources with that config should at least get your card properly up and running.

babau commented 11 years ago

I did black list the nvidia driver but with the same result

if this can help I have a Lenovo W530 with K2000M on board

godlike64 commented 11 years ago

Could you do a clean reboot, ensure nvidia driver was not loaded, and run nvidia-xconfig -query-gpu-info? If it fails, right after that check the last lines of dmesg and paste them here. I believe the problem lies with the very first access to the card.

babau commented 11 years ago

nvidia module is not loaded

nvidia-xconfig -query-gpu-info NVIDIA: could not open the device file /dev/nvidiactl (No such file or directory).

ERROR: Unable to query GPU information

but nothing show up un my dmesg

[ 10.791310] EXT4-fs (sda2): re-mounted. Opts: discard,commit=0 [ 10.987301] EXT4-fs (dm-2): re-mounted. Opts: commit=0 [ 10.988997] EXT4-fs (dm-0): re-mounted. Opts: commit=0 [ 10.990371] EXT4-fs (dm-1): re-mounted. Opts: commit=0 [ 11.209627] Bluetooth: HIDP (Human Interface Emulation) ver 1.2

godlike64 commented 11 years ago

That's weird... running that command should load the module for you. Are you sure you set video mode to Optimus in the BIOS?

babau commented 11 years ago

bios is set to optimus

if this can help here is my kernel config

http://www.babau.me/config

Lekensteyn commented 11 years ago

@babau Can you try kernel 3.5? Someone reported issues with 3.6 in combination with kernel 3.6 Bumblebee-Project/bbswitch#35

babau commented 11 years ago

sorry for late reply I did down grade the kernel to 3.5.7 and now all is working perfectly

thx for the support