Bumblebee-Project / bbswitch

Disable discrete graphics (currently nvidia only)
GNU General Public License v2.0
487 stars 78 forks source link

ipmi_msghandler doesnt let the unload nvidia driver. #173

Closed onsc closed 5 years ago

onsc commented 6 years ago

Hi. My laptop has nvidia 950m so has optimus technology. as i write at title. ipmi_msghandler stops unloading nvidia. I think many people suffers about this.

lsmod
nvidia              14045184  12
ipmi_msghandler        57344  1 nvidia
....
sudo rmmod nvidia
rmmod: ERROR: Module nvidia is in use
sudo rmmod ipmi_msghandler 
rmmod: ERROR: Module ipmi_msghandler is in use by: nvidia

i tried to blacklist ipmi_msghandler but it didnt work. my blacklist.conf :

install nouveau /usr/bin/false
#install nvidia /usr/bin/false
#blacklist nvidia
#blacklist nouveau
#remove nvidia modprobe -r --ignore-remove nvidia-modeset nvidia-uvm nvidia
install ipmi_si /usr/bin/false
install ipmi_devintf /usr/bin/false
install ipmi_msghandler /usr/bin/false

mkinitpcio -p linux is done every blacklist.conf changing then restarted.... remove code didnt work so i commented it. i removed nvidia by pacman. So no ipmi modules loaded. But after installing nvidia driver, ipmi_msghandler loads.I also installed bumblebee so all nvidia drivers are blacklisted. modprobe -c :

blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
blacklist nvidia_uvm
blacklist nouveau
blacklist nouveau
install nouveau /usr/bin/false
install ipmi_si /usr/bin/false
install ipmi_devintf /usr/bin/false
install ipmi_msghandler /usr/bin/false
...

i use arch linux. kernel 4.17.3-1 nvidia 396.24-13

i also tried with nvidia-dkms , nothing changed.

i blacklisted nvidia by install command. So no nvidia module or ipmi modules loaded. Then i tried to modprobe nvidia, permission denied by blacklist.conf. Without restarting my laptop, I commented nvidia line in blacklist.conf . Then i tried bbswitch it works. If i force to remove nvidia driver bbswitch works on OFF mode, but cant set ON mode. Laptop freezes.

i tried to detect if my laptop has ipmi. There is no BIOS entry or /dev/ipmi* devices. Also tried by freeipmi tools, ipmi-detect ipmi-ping , dmesg etc. All ipmi tools are uninstalled and there is no systemctl service about ipmi.

Thank you. ( sorry about my english. )

randombk commented 6 years ago

I have the exact same issue and symptoms, though I'm unsure if IPMI is the issue (it looks like IPMI is a dependency of Nvidia, not the other way around so I don't think that is the reason why the module can't be unloaded). Switching to nouveau seems to be the only viable workaround for VFIO users.

gsgxnet commented 6 years ago

same here. I need the NVidia GPU for CUDA only. So normally no need for bbswitch etc. Working setup in the past was a /etc/modprobe.d/50-nvidia-az.conf file blacklisting all nvidia drivers:

blacklist nvidia-nvlink
blacklist nvidia-modeset
blacklist nvidia-uvm
blacklist nvidia-drm
blacklist nvidia

When GPU is needed for CUDA just manually modprobe the modules. Command used for that: nvidia-modprobe -c 0

Now when installing the 410 drivers this setup does not work any more. All drives are loaded at boot, despite the blacklist. Same dependency on ipmi_msghandler 65536 2 ipmi_devintf,nvidia. So trying to modprobe -r does neither succeed with nvidia.

Anybody any clue?

mysticaltech commented 6 years ago

Same problem here! Anyone?

mysticaltech commented 6 years ago

@gsgxnet Got it. List all processes using nvidia and kill them.

lsof | grep /dev/nvidia

Now kill all the processes you see using nvidia. They are chained, so just killing like the 3 mother processes will work.

kill 1234

Then:

modprobe -f -r nvidia_drm
modprobe -f -r nvidia_nodeset
modprobe -f -r nvidia

This will successfully unload nvidia, so the installation can proceed. If there are other errors, of course checking the installer log file is useful. In my case it somehow detected a that X was running, so I also had to kill the process mentioned in the log and remove /tmp/.X1-lock, but I think that this may be particular to my machine.

onsc commented 6 years ago

@gsgxnet Got it. List all processes using nvidia and kill them.

lsof | grep /dev/nvidia

Now kill all the processes you see using nvidia. They are chained, so just killing like the 3 mother processes will work.

kill 1234

Then:

modprobe -f -r nvidia_drm
modprobe -f -r nvidia_nodeset
modprobe -f -r nvidia

This will successfully unload nvidia, so the installation can proceed. If there are other errors, of course checking the installer log file is useful. In my case it somehow detected a that X was running, so I also had to kill the process mentioned in the log and remove /tmp/.X1-lock, but I think that this may be particular to my machine.

i havent tried to kill process before switch. Can you switch graphics now?

mysticaltech commented 6 years ago

Sadly man I don't know, I'm just using this technique to update the Nvidia driver, not actually using bumblebee.

randombk commented 6 years ago

Unfortunately, killing all processes won't help with more advanced use cases (VFIO is the one I care most about) . Both X and Wayland hold references to the Nvidia driver, and must be killed before unloading the driver, effectively killing hotplugging functionality.

abditag2 commented 5 years ago

Any progress on this one? I have the same problem.

onsc commented 5 years ago

At The END, i managed to run it.

First of all, i followed the instruction at this site => https://antergos.com/wiki/hardware/bumblebee-for-nvidia-optimus/

There were some instructions about kernel parameter => https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-234494238

so i added this => acpi_osi=! acpi_osi="Windows 2009"

my laptop is "MSI GL62 6QD"

using nvidia driver 430 and kernel 5.1

i have done many things but imho what missing is acpi_osi=! parameter

i hope this helps some ppl.

colapsnux commented 5 years ago

Try adding between the last install nvidia.... and remove nvidia.... line in /etc/modprobe.d/nvidia.conf file

install nvidia /bin/false

then

sudo update-initramfs -u

Should be done after reboot.

Its a little hack but its work for me !

EricTheMagician commented 4 years ago

I found this issue on google. Posting here for posterity.

I had the nvidia-persistenced service running. I stopped it with sudo systemctl stop nvidia-persistenced.service