MrMEEE / bumblebee-Old-and-abbandoned

OUTDATED!!!!! - Replaced by "The Bumblebee Project" and "Ironhide"
http://www.martin-juhl.dk/2011/08/ironhide-reporting-for-duty/
469 stars 50 forks source link

FATAL: Error inserting nvidia_current #552

Closed auberondreaming closed 13 years ago

auberondreaming commented 13 years ago

Hi all, sorry if this is a repeat but I have not seen an issue about this.

Our system: Dell XPS15 L502x NVIDIA quadro 540m Ubuntu 11.04 64bit

After a fresh install of 11.04 I have installed bumblebee from the PPA. nvidia-current is a dependency and gets installed. I try and run optirun glxgears, but am told that nvidia module could not be found. If I lspci I see that the intel onboard GPU has its full description, but the 540m is just showing up as nvidea device, no description. I think this may be the reason bumblebee isn't working for us. Has this issue come up or been resolved and I am just missing something? Thanks!

off topic quick question: Can you use CUDA through bumblebee?

ArchangeGabriel commented 13 years ago

I remember some people were able to use CUDA, so it should be yes.

Could you give us the exact error message ?

Also, could you paste your /etc/X11/Xorg.8.log file to pastebin and link it here ?

Lekensteyn commented 13 years ago

It should be listed as supported: ftp://download.nvidia.com/XFree86/Linux-x86_64/280.04/README/supportedchips.html

Could you post your /var/log/kern.log in addition to your /var/log/Xorg.8.log and the output of:

lspci -v -s $(lspci | grep VGA | grep nVidia | cut -d' ' -f1)
auberondreaming commented 13 years ago

Update: I got it to work once (optirun glxgears worked). Then I rebooted, and seemed to lose the nvidia driver. Now I am constantly having the nvidia-current deactivated. I reactivate and reboot, then when I sudo modprobe nvidia-current I get FATAL: Error inserting nvidia_current (/lib/modules/2.6.38-8-generic/updates/dkms/nvidia-current.ko): No such device

When the nvidia-current is active but not being used, I am no longer able to run glxgears by itself on the intel onboard, it errors out with Xlib: extension "GLX" missing on display ":0.0". Error: couldn't get an RGB, Double-buffered visual

I dont see an Xorg.8.log in /etc/X11...?

half the kern.log is here, let me know if you need the other half: http://pastebin.com/PaueTfUc

01:00.0 VGA compatible controller: nVidia Corporation Device 0df4 (rev a1) (prog-if 00 [VGA controller])
Flags: fast devsel, IRQ 16
[virtual] Memory at f0000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 3000 [size=128]
[virtual] Expansion ROM at f1000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel modules: nvidia-current, nouveau, nvidiafb
auberondreaming commented 13 years ago

Ok, found the xorg.8.log after running bug-report script

http://pastebin.com/e6nmCi8k

Lekensteyn commented 13 years ago

Please blacklist the nouveau driver:

echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u
auberondreaming commented 13 years ago

nouveau driver blacklisted now. Unfortunately still have FATAL: Error inserting nvidia_current(...) though.

auberondreaming commented 13 years ago

I reinstalled the nvidia driver and got it to be loaded once. Then bumblebee worked. I just can't seem to get nvidia to stick though, as on reboot I now cannot see it. lsmod |grep nvidia returns empty. When it shows up, optirun works. I try and modprobe and still get that FATAL error.

TheFan commented 13 years ago

Hey guys,

that's exactly the same error I have. My system is the same.

Unfortunaly I have no idea how I could start fixing this problem. What could I do?

After a 'rmmod nouveau' (I read this in some other forums) and a restart my system was crashed and I can't start the xserver. I'm gonna to reinstall Ubuntu, that's no problem.

auberondreaming commented 13 years ago

There is something wrong in bumblebees startup/init scripts I think. If I uninstall bumblebee and leave nvidia-current installed then reboot, the nvidia module will successfully load. I can then re-install bumblebee and it works fine. Until reboot, then this problem returns. I uninstall, reboot, install again and it works. Repeat. Any ideas on what I could change in the bumblebee scripts to avoid this?

Lekensteyn commented 13 years ago

I observed:

[ nvidia card on ] - reboot - [ OK ]
[ nvidia card off ] - reboot - turn on - [ disaster ]

I've not enough data to confirm that, but it seems to apply. For now, if optirun is working (and the card is on), please comment out the ACPI calls as those are not performed well. Me and ArchangeGabriel are doing research on the ACPI methods now (reading a spec of 731 pages)

patrici0 commented 13 years ago

@Lekensteyn: I've been working on the same issue for a couple of nights now and I couldn't reproduce your observation.

What I did observe is the following (XPS L502x i7 sandy bridge/GF540M - ubuntu 11.04). When I uninstall bumblebee+virtualgl and reboot, my nvidia module becomes loadable again. Right then I reinstall bumblebee+virtualgl (no reboot) and it works. Once I reboot, it stops working until I go over this again.

I'm using nvidia-current from this repo. Here's a summary of the few things I did to get here after the clean install.

1- blacklist nouveau (and manually rmmod it) 2- add this repo 3- install nvidia-current (from this repo) 4- install bumblebee+virtualgl and config it 5- test optirun, test -enablecard / *-disablecard Until here everything works. 6- reboot 7- FATAL message of death (Error inserting nvidia-current...) regardless of the state on which the card was before reboot (enabled nor disabled)

Cheers.

kern.log

Aug  2 21:22:01 tuxic kernel: [  857.432209] acpi_call: Calling \_SB.PCI0.PEG0.PEGP._ON
Aug  2 21:22:01 tuxic kernel: [  857.880070] acpi_call: Call successful: {0x00, 0x10, 0x00, 0xa1, 0x00, 0x00, 0x03, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x68, 0x03, 0x00, 0x08, 0x00, 0x00, 0x00, 0x05, 0x78, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0xb4, 0x02, 0x00, 0xa0, 0x8d, 0x2c, 0x01, 0x00, 0x00, 0x00, 0x00, 0x02, 0x4d, 0x05, 0x00, 0x00, 0x00, 0x01, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
Aug  2 21:22:04 tuxic kernel: , 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0
Aug  2 21:22:04 tuxic kernel: [  861.023930] acpi_call: Calling \_SB.PCI0.PEG0.PEGP._ON
Aug  2 21:22:05 tuxic kernel: [  861.468220] acpi_call: Call successful: {0x00, 0x10, 0x00, 0xa1, 0x00, 0x00, 0x03, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x68, 0x03, 0x00, 0x08, 0x00, 0x00, 0x00, 0x05, 0x78, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0xb4, 0x02, 0x00, 0xa0, 0x8d, 0x2c, 0x01, 0x00, 0x00, 0x00, 0x00, 0x02, 0x4d, 0x05, 0x00, 0x00, 0x00, 0x01, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
Aug  2 21:22:05 tuxic kernel: , 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0
Aug  2 21:22:05 tuxic kernel: [  861.833212] nvidia 0000:01:00.0: power state changed by ACPI to D0
Aug  2 21:22:05 tuxic kernel: [  861.833224] nvidia 0000:01:00.0: power state changed by ACPI to D0
Aug  2 21:22:05 tuxic kernel: [  861.833234] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
Aug  2 21:22:05 tuxic kernel: [  861.833250] nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Aug  2 21:22:05 tuxic kernel: [  861.833263] nvidia 0000:01:00.0: setting latency timer to 64
Aug  2 21:22:05 tuxic kernel: [  861.833276] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=none
Aug  2 21:22:05 tuxic kernel: [  861.833351] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:0df4) installed
Aug  2 21:22:05 tuxic kernel: [  861.833353] NVRM: in this system is not supported by the 280.04 NVIDIA Linux
Aug  2 21:22:05 tuxic kernel: [  861.833356] NVRM: graphics driver release.  Please see 'Appendix A -
Aug  2 21:22:05 tuxic kernel: [  861.833358] NVRM: Supported NVIDIA GPU Products' in this release's README,
Aug  2 21:22:05 tuxic kernel: [  861.833360] NVRM: available on the Linux graphics driver download page at
Aug  2 21:22:05 tuxic kernel: [  861.833363] NVRM: www.nvidia.com.
Aug  2 21:22:05 tuxic kernel: [  861.833376] nvidia 0000:01:00.0: PCI INT A disabled
Aug  2 21:22:05 tuxic kernel: [  861.833390] nvidia: probe of 0000:01:00.0 failed with error -1
Aug  2 21:22:05 tuxic kernel: [  861.833432] NVRM: The NVIDIA probe routine failed for 1 device(s).
Aug  2 21:22:05 tuxic kernel: [  861.833436] NVRM: None of the NVIDIA graphics adapters were initialized!

Xorg.8.log

[   340.631] (II) Loading /usr/lib/nvidia-current/xorg/nvidia_drv.so
[   340.631] (II) Loading /usr/lib/xorg/modules/libwfb.so
[   340.631] (II) Loading /usr/lib/xorg/modules/libfb.so
[   340.631] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[   340.631] (==) NVIDIA(0): RGB weight 888
[   340.631] (==) NVIDIA(0): Default visual is TrueColor
[   340.631] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[   340.631] (**) NVIDIA(0): Option "ConnectedMonitor" "CRT-0"
[   340.631] (**) NVIDIA(0): ConnectedMonitor string: "CRT-0"
[   344.474] (WW) NVIDIA(GPU-0): Unable to read EDID for display device CRT-0
[   344.475] (II) NVIDIA(0): NVIDIA GPU GeForce GT 540M (GF108) at PCI:1:0:0 (GPU-0)
[   344.475] (--) NVIDIA(0): Memory: 2097152 kBytes
[   344.475] (--) NVIDIA(0): VideoBIOS: 70.08.44.00.11
[   344.475] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[   344.475] (--) NVIDIA(0): Interlaced video modes are supported on this GPU
[   344.475] (--) NVIDIA(0): Connected display device(s) on GeForce GT 540M at PCI:1:0:0
[   344.475] (--) NVIDIA(0):     CRT-0
[   344.475] (--) NVIDIA(0): CRT-0: 400.0 MHz maximum pixel clock
[   344.479] (**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID has been
[   344.479] (**) NVIDIA(0):     enabled on all display devices.
[   344.484] (II) NVIDIA(0): Assigned Display Device: CRT-0
[   344.484] (WW) NVIDIA(0): No valid modes for "1920x1200"; removing.
[   344.484] (WW) NVIDIA(0): No valid modes for "1600x1200"; removing.
[   344.484] (II) NVIDIA(0): Validated modes:
[   344.484] (II) NVIDIA(0):     "1920x1080"
[   344.484] (II) NVIDIA(0):     "1680x1050"
[   344.484] (II) NVIDIA(0):     "1440x900"
[   344.484] (II) NVIDIA(0):     "1280x1024"
[   344.484] (II) NVIDIA(0):     "1366x768"
[   344.484] (II) NVIDIA(0):     "1360x768"
[   344.484] (II) NVIDIA(0):     "1280x800"
[   344.484] (II) NVIDIA(0):     "1024x768"
[   344.484] (II) NVIDIA(0):     "800x600"
[   344.484] (II) NVIDIA(0):     "640x480"
[   344.484] (II) NVIDIA(0): Virtual screen size determined to be 1920 x 1080
[   344.487] (WW) NVIDIA(0): Unable to get display device CRT-0's EDID; cannot compute DPI
[   344.487] (WW) NVIDIA(0):     from CRT-0's EDID.
[   344.487] (==) NVIDIA(0): DPI set to (75, 75); computed from built-in default
[   344.487] (--) Depth 24 pixmap format is 32 bpp
[   344.487] (II) NVIDIA: Using 3072.00 MB of virtual memory for indirect memory
[   344.487] (II) NVIDIA:     access.
[   344.590] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[   344.590] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[   344.590] (II) NVIDIA(0):     configuration option may not be set correctly.  When the
[   344.590] (II) NVIDIA(0):     ACPI event daemon is available, the NVIDIA X driver will
[   344.590] (II) NVIDIA(0):     try to use it to receive ACPI event notifications.  For
[   344.590] (II) NVIDIA(0):     details, please see the "ConnectToAcpid" and
[   344.590] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X
[   344.590] (II) NVIDIA(0):     Config Options in the README.
[   344.598] (II) NVIDIA(0): Setting mode "1920x1080"
[   344.637] (II) Loading extension NV-GLX
[   344.695] (==) NVIDIA(0): Disabling shared memory pixmaps
[   344.695] (==) NVIDIA(0): Backing store disabled
[   344.695] (==) NVIDIA(0): Silken mouse enabled
[   344.695] (**) NVIDIA(0): DPMS enabled
[   344.695] (II) Loading extension NV-CONTROL
[   344.695] (II) Loading extension XINERAMA
[   344.695] (WW) NVIDIA(0): Option "IgnoreEDID" is not used
[   344.695] (II) Loading sub module "dri2"
[   344.695] (II) LoadModule: "dri2"
[   344.695] (II) Loading /usr/lib/xorg/modules/extensions/libdri2.so
[   344.695] (II) Module dri2: vendor="X.Org Foundation"
[   344.695]    compiled for 1.10.1, module version = 1.2.0
[   344.695]    ABI class: X.Org Server Extension, version 5.0
[   344.695] (II) NVIDIA(0): [DRI2] Setup complete
patrici0 commented 13 years ago

Ok, I have found my problem and a workaround.

The script /usr/bin/bumblebee-disablecard-on-powerup gets called during the boot process from the symlink /etc/pm/power.d/bumblebee-disablecard-on-powerup which looks like this:

#!/bin/sh

ENABLECARD=/usr/local/bin/bumblebee-enablecard
DISABLECARD=/usr/local/bin/bumblebee-disablecard

####
## This script disables nVidia card if no optirun is running.
####
if ! pidof -x /usr/bin/optirun /usr/bin/optirun32 /usr/bin/optirun64 >/dev/null; then
    $ENABLECARD
    $DISABLECARD
fi

I'm not sure why but the execution of $ENABLECARD is messing with the loading of the nvidia module later on. I commented it out and now the card seems to still getting turned off during boot (which is what I want) but the drivers are loaded/unloaded at will by optirun when called upon.

The modified file looks very similar:

#!/bin/sh

ENABLECARD=/usr/local/bin/bumblebee-enablecard
DISABLECARD=/usr/local/bin/bumblebee-disablecard

####
## This script disables nVidia card if no optirun is running.
####
if ! pidof -x /usr/bin/optirun /usr/bin/optirun32 /usr/bin/optirun64 >/dev/null; then
    #$ENABLECARD
    $DISABLECARD
fi

Note the commented #$ENABLE

Lekensteyn commented 13 years ago

Thank you for your research, I'm going to check logs on that. The issue is that the ACPI methods may not be called in the right way.

I also found some weird behavior of which I am not sure if it's related or not: when rmmod nvidia, /dev/nvidia0 and /dev/nvidiactl still exist. On accessing (read / write) those files (file -s /dev/nvidia*), the nvidia kernel module becomes active again.

auberondreaming commented 13 years ago

@patriciov: I can confirm your workaround works for me. I am now able to use bumblebee with rebooting. Hurray! Now to see if my CUDA application will work.

Is there a forum that is not a bug tracker for this project? I have some questions I would like to ask that are not issues with this project.

ArchangeGabriel commented 13 years ago

Ask them on freenode

kirmonkey commented 13 years ago

I can confirm that the solution given by patrici0 works for me.

Dell XPS L502x Ubuntu 11.04 Nvidia 525M 2.6.38-10-generic

Frank16729 commented 13 years ago

works for me too!!!

https://github.com/MrMEEE/bumblebee/issues/411

thanks!!!

quarxdmz commented 13 years ago

works for me too:

Acer 4750G i5-2410 Xubuntu 11.04 Nvidia GT540M

rockorequin commented 13 years ago

Thanks, that fixed it for me too.

glaasje commented 13 years ago

Help! this didnt fix my problem! :(

Geek2France commented 13 years ago

I confirm, it's solved for me too :)

ElighCS commented 13 years ago

Thanks patrici0.

ArchangeGabriel commented 13 years ago

Please try the new version here : https://github.com/Bumblebee-Project/Bumblebee.

rockorequin commented 13 years ago

But the notes at that link says it does not have any power management at all, ie it doesn't turn the nvidia card off at all. Is that correct? Am I not better off sticking with what I have since it does turn off the card?

quarxdmz commented 13 years ago

with nVidia turned on, overall temperature of my laptop increased. I have measured 4degrees Centigrade increase from the last record with nVidia turned off.

I with power management would be back in the next release.

ArchangeGabriel commented 13 years ago

It is our first priority.

But please read the page ACPI-Removed in the wiki.

rockorequin commented 13 years ago

OK, so according to the ACPI-Removed wiki I am certainly for now better off staying with my current driver, because it works on my system (and saves a considerable amount of power with the nvidia card turned off, according to ACPI). The wiki would be improved if it referred to the specific bug reports that explains what gets broken and on which systems.

glaasje commented 13 years ago

It works for me now! :D I used the stable ppa to get in working! ;) (drinks are on me!!!!)

avilella commented 13 years ago

It is worth mentioning that most look ups after suspend/hibernate or reboot into Windows can be avoided by using the _PS0 and _PS3 calls that many ACPI tables already have, instead of calling directly the _ON and _OFF methods. Calling the ON/OFF methods is fine if it's only to switch the card on/off during a Linux session, in sync with the start/finish of the bumblebee deamon, but to make sure there are no side effects, the PS0/PS3 methods should hooked up into the Linux session so that they are appropriately called when suspending/hibernating/rebooting into Windows.

ArchangeGabriel commented 13 years ago

_ON and _OFF are really not a good idea even inside Linux.

_PS0 and _PS3 are only half the way to a really secure method.

Did you read the ACPI specs ? Because we're really playing with things we don't understand for now.

avilella commented 13 years ago

Understood. Apologies.

ArchangeGabriel commented 13 years ago

By the way, if people want to try that anyway, we would soon add to bumblebee the necessary structure for that, and people wanting to play with this will just have to put their calls at the right place.

However, we won't make any debug for this feature, and people reporting bug while using it may just be ignored...

rockorequin commented 13 years ago

Would you be able to let us know when the necessary structure is in place? Power management is too important for me at the moment for me to try the new bumblebee, because my laptop battery lasts much longer with the nvidia card powered down.

ArchangeGabriel commented 13 years ago

Follow issue #47 and branch common-acpi-framework.

It will be released next sunday.