bayasdev / envycontrol

Easy GPU switching for Nvidia Optimus laptops under Linux
https://bayas.dev/envycontrol
MIT License
1.18k stars 60 forks source link

When I select only Integrated, at boot I will get a message "Missing NVIDIA kernel module, defaulting to Nouveau" #121

Closed logicito closed 5 months ago

logicito commented 1 year ago

Describe the bug Brand new Fedora 38 installation, using the NVIDIA drivers from Gnome Software (RPM), configured to ONLY Integrated iGPU Intel, and at boot I get the message: "Missing NVIDIA kernel module, defaulting to Nouveau"

To Reproduce Install NVIDIA driver from Fedora RPM via Gnome Software Using EnvyControl set it only for Integrated Boot

Expected behavior No error message at Boot

Screenshots If applicable, add screenshots to help explain your problem.

System Information:


**Additional context**
Add any other context about the problem here. If possible try to reproduce the problem with `--verbose` flag and attach its output.

paste here

logicito commented 1 year ago

Update: I have tried with multiple fresh installations, always at boot it will show the message of missing NVIDIA kernel module, the only way to get rid of it, is to switch from Integrated to Hybrid, the default mode, that is counter productive, because is the same as not having EnvyControl installed. If you need more information, please let me know, thank you

bayasdev commented 1 year ago

It's a known problem with the RPM Fusion Nvidia driver, I think we need to add/modify some Fedora/RHEL specific boot params in addition to blacklisting the kernel modules via modprobe but haven't really looked into it.

In the meanwhile, it's safe to ignore it.

Boria138 commented 11 months ago

Update: I have tried with multiple fresh installations, always at boot it will show the message of missing NVIDIA kernel module, the only way to get rid of it, is to switch from Integrated to Hybrid, the default mode, that is counter productive, because is the same as not having EnvyControl installed. If you need more information, please let me know, thank you

Disable the nvidia-fallback.service and this warning will no longer occur

Boria138 commented 11 months ago

https://github.com/rpmfusion/xorg-x11-drv-nvidia/blob/master/10-nvidia.rules https://github.com/rpmfusion/xorg-x11-drv-nvidia/blob/master/nvidia-fallback.service

Boria138 commented 11 months ago

@logicito Write sudo systemctl --type=service | grep "nvidia-fallback" output please

klmcwhirter commented 5 months ago

@Boria138 This is what I see:

sudo systemctl --type=service | grep "nvidia-fallback"
  nvidia-fallback.service                               loaded active exited  Fallback to nouveau as nvidia did not load

I disabled and stopped the service but that did not seem to have effect. After reboot it is active again. And I still saw the message.

sudo systemctl status nvidia-fallback.service 
● nvidia-fallback.service - Fallback to nouveau as nvidia did not load
     Loaded: loaded (/usr/lib/systemd/system/nvidia-fallback.service; disabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (exited) since Fri 2024-03-01 17:53:10 PST; 1min 45s ago
    Process: 1309 ExecStart=/sbin/modprobe nouveau (code=exited, status=1/FAILURE)
    Process: 1316 ExecStartPost=/bin/plymouth message --text=NVIDIA kernel module missing. Falling back to nouveau (code=exited, status=0/SUCCESS)
   Main PID: 1309 (code=exited, status=1/FAILURE)
        CPU: 5ms

Mar 01 17:53:10 fedora systemd[1]: Starting nvidia-fallback.service - Fallback to nouveau as nvidia did not load...
Mar 01 17:53:10 fedora modprobe[1309]: modprobe: ERROR: libkmod/libkmod-module.c:895 kmod_module_insert_module() could not find module by name='off'
Mar 01 17:53:10 fedora modprobe[1309]: modprobe: ERROR: could not insert 'off': Unknown symbol in module, or unknown parameter (see dmesg)
Mar 01 17:53:10 fedora systemd[1]: Finished nvidia-fallback.service - Fallback to nouveau as nvidia did not load.

But note this:

$ lsmod | grep nouveau
$ lsmod | grep nvidia
nvidia_wmi_ec_backlight    12288  0
video                  77824  3 nvidia_wmi_ec_backlight,acer_wmi,i915
wmi                    45056  4 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof

$ lsmod | grep video
uvcvideo              176128  0
uvc                    12288  1 uvcvideo
videobuf2_vmalloc      20480  1 uvcvideo
videobuf2_memops       16384  1 videobuf2_vmalloc
videobuf2_v4l2         40960  1 uvcvideo
videobuf2_common       94208  4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops
videodev              393216  5 v4l2_async,v4l2_fwnode,videobuf2_v4l2,ov13858,uvcvideo
mc                     90112  6 v4l2_async,videodev,videobuf2_v4l2,ov13858,uvcvideo,videobuf2_common
video                  77824  3 nvidia_wmi_ec_backlight,acer_wmi,i915
wmi                    45056  4 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof

Note on my other HD with the same exact system but without akmod-nvidia installed, I do not see that message. Must be built into the driver.

Just double-checked the lsmod output above from both systems are identical.

Observations:

I do not believe your change is needed.

As a final test, on the system without akmod-nvidia installed, I executed envycontrol --reset and rebooted. This is now the lsmod output.

$ lsmod | grep nouveau
nouveau              3641344  0
drm_gpuvm              28672  1 nouveau
mxm_wmi                12288  1 nouveau
drm_exec               12288  1 nouveau
gpu_sched              65536  1 nouveau
drm_ttm_helper         12288  1 nouveau
i2c_algo_bit           20480  2 i915,nouveau
ttm                   110592  3 drm_ttm_helper,i915,nouveau
drm_display_helper    229376  2 i915,nouveau
video                  77824  4 nvidia_wmi_ec_backlight,acer_wmi,i915,nouveau
wmi                    45056  6 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof,mxm_wmi,nouveau

$ lsmod | grep nvidia
nvidia_wmi_ec_backlight    12288  0
video                  77824  4 nvidia_wmi_ec_backlight,acer_wmi,i915,nouveau
wmi                    45056  6 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof,mxm_wmi,nouveau

$ lsmod | grep video
uvcvideo              176128  0
uvc                    12288  1 uvcvideo
videobuf2_vmalloc      20480  1 uvcvideo
videobuf2_memops       16384  1 videobuf2_vmalloc
videobuf2_v4l2         40960  1 uvcvideo
videobuf2_common       94208  4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops
videodev              393216  5 v4l2_async,v4l2_fwnode,videobuf2_v4l2,ov13858,uvcvideo
mc                     90112  6 v4l2_async,videodev,videobuf2_v4l2,ov13858,uvcvideo,videobuf2_common
video                  77824  4 nvidia_wmi_ec_backlight,acer_wmi,i915,nouveau
wmi                    45056  6 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof,mxm_wmi,nouveau
Boria138 commented 5 months ago

@Boria138 This is what I see:

sudo systemctl --type=service | grep "nvidia-fallback"
  nvidia-fallback.service                               loaded active exited  Fallback to nouveau as nvidia did not load

I disabled and stopped the service but that did not seem to have effect. After reboot it is active again. And I still saw the message.

sudo systemctl status nvidia-fallback.service 
● nvidia-fallback.service - Fallback to nouveau as nvidia did not load
     Loaded: loaded (/usr/lib/systemd/system/nvidia-fallback.service; disabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (exited) since Fri 2024-03-01 17:53:10 PST; 1min 45s ago
    Process: 1309 ExecStart=/sbin/modprobe nouveau (code=exited, status=1/FAILURE)
    Process: 1316 ExecStartPost=/bin/plymouth message --text=NVIDIA kernel module missing. Falling back to nouveau (code=exited, status=0/SUCCESS)
   Main PID: 1309 (code=exited, status=1/FAILURE)
        CPU: 5ms

Mar 01 17:53:10 fedora systemd[1]: Starting nvidia-fallback.service - Fallback to nouveau as nvidia did not load...
Mar 01 17:53:10 fedora modprobe[1309]: modprobe: ERROR: libkmod/libkmod-module.c:895 kmod_module_insert_module() could not find module by name='off'
Mar 01 17:53:10 fedora modprobe[1309]: modprobe: ERROR: could not insert 'off': Unknown symbol in module, or unknown parameter (see dmesg)
Mar 01 17:53:10 fedora systemd[1]: Finished nvidia-fallback.service - Fallback to nouveau as nvidia did not load.

But note this:

$ lsmod | grep nouveau
$ lsmod | grep nvidia
nvidia_wmi_ec_backlight    12288  0
video                  77824  3 nvidia_wmi_ec_backlight,acer_wmi,i915
wmi                    45056  4 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof

$ lsmod | grep video
uvcvideo              176128  0
uvc                    12288  1 uvcvideo
videobuf2_vmalloc      20480  1 uvcvideo
videobuf2_memops       16384  1 videobuf2_vmalloc
videobuf2_v4l2         40960  1 uvcvideo
videobuf2_common       94208  4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops
videodev              393216  5 v4l2_async,v4l2_fwnode,videobuf2_v4l2,ov13858,uvcvideo
mc                     90112  6 v4l2_async,videodev,videobuf2_v4l2,ov13858,uvcvideo,videobuf2_common
video                  77824  3 nvidia_wmi_ec_backlight,acer_wmi,i915
wmi                    45056  4 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof

Note on my other HD with the same exact system but without akmod-nvidia installed, I do not see that message. Must be built into the driver.

Just double-checked the lsmod output above from both systems are identical.

Observations:

* no nouveau loaded

* using i915 driver in integrated mode with or without akmod-nvidia rpmfusion driver installed.

I do not believe your change is needed.

As a final test, on the system without akmod-nvidia installed, I executed envycontrol --reset and rebooted. This is now the lsmod output.

$ lsmod | grep nouveau
nouveau              3641344  0
drm_gpuvm              28672  1 nouveau
mxm_wmi                12288  1 nouveau
drm_exec               12288  1 nouveau
gpu_sched              65536  1 nouveau
drm_ttm_helper         12288  1 nouveau
i2c_algo_bit           20480  2 i915,nouveau
ttm                   110592  3 drm_ttm_helper,i915,nouveau
drm_display_helper    229376  2 i915,nouveau
video                  77824  4 nvidia_wmi_ec_backlight,acer_wmi,i915,nouveau
wmi                    45056  6 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof,mxm_wmi,nouveau

$ lsmod | grep nvidia
nvidia_wmi_ec_backlight    12288  0
video                  77824  4 nvidia_wmi_ec_backlight,acer_wmi,i915,nouveau
wmi                    45056  6 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof,mxm_wmi,nouveau

$ lsmod | grep video
uvcvideo              176128  0
uvc                    12288  1 uvcvideo
videobuf2_vmalloc      20480  1 uvcvideo
videobuf2_memops       16384  1 videobuf2_vmalloc
videobuf2_v4l2         40960  1 uvcvideo
videobuf2_common       94208  4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops
videodev              393216  5 v4l2_async,v4l2_fwnode,videobuf2_v4l2,ov13858,uvcvideo
mc                     90112  6 v4l2_async,videodev,videobuf2_v4l2,ov13858,uvcvideo,videobuf2_common
video                  77824  4 nvidia_wmi_ec_backlight,acer_wmi,i915,nouveau
wmi                    45056  6 video,nvidia_wmi_ec_backlight,acer_wmi,wmi_bmof,mxm_wmi,nouveau

All the nvidia-fallback service does is to try to start nouveau via modprobe and display messages in plymouth, the changes are only cosmetic, it doesn't affect functionality, but I still think you should disable this service so people don't get confused, probably the best option is to mention this rule in the readme because I think even if you mask the service udev will still call it.

Boria138 commented 5 months ago

I just checked it looks like if you mask the service then udev rule doesn't work and plymouth doesn't write anything, @klmcwhirter please check with yourself if this works and if it does I will update pr

klmcwhirter commented 5 months ago

I have moved on to a different approach that seems to be working better for me. Please work with Victor.

On Thursday, March 7, 2024 at 12:18:46 AM PST, Boria138 ***@***.***> wrote:  

I just checked it looks like if you mask the service then udev rule doesn't work and plymouth doesn't write anything, @klmcwhirter please check with yourself if this works and if it does I will update pr

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Boria138 commented 5 months ago

I have moved on to a different approach that seems to be working better for me. Please work with Victor.

On Thursday, March 7, 2024 at 12:18:46 AM PST, Boria138 ***@***.***> wrote:  

I just checked it looks like if you mask the service then udev rule doesn't work and plymouth doesn't write anything, @klmcwhirter please check with yourself if this works and if it does I will update pr

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

What different approach ?

bayasdev commented 5 months ago

I have moved on to a different approach that seems to be working better for me. Please work with Victor.

On Thursday, March 7, 2024 at 12:18:46 AM PST, Boria138 ***@***.***> wrote:  

I just checked it looks like if you mask the service then udev rule doesn't work and plymouth doesn't write anything, @klmcwhirter please check with yourself if this works and if it does I will update pr

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

What different approach ?

@klmcwhirter is talking about this

https://github.com/klmcwhirter/nvidia-more-battery

klmcwhirter commented 5 months ago

My use case is very specific. I am a developer - NOT a gamer that needs a dedicated GPU. So my goal since I had to buy a $700 laptop from a local sticks-and-bricks has been to turn off the dGPU and recover some battery life. I do not want to have to install the nvidia driver. I did have it installed on the HD I was using to test with. But not in my production HD. Here is the POC I put together. The first link in the README spells out the details of the approach.

https://github.com/klmcwhirter/nvidia-more-battery/ - Get battery time back by making usage of nvidia GPU optional for systems with Optimus

Note I am working with Victor on the side to potentially include some of this in envycontrol upon switch to integrated mode. Also, note that with the rpmfusion nvidia driver installed you will still get the innocuous message about the fallback to nouveau.

There is no real panacea here - 3-1/2 hrs battery instead of ~2 hrs. BUT, it was nice to wake up this morning with 96% battery instead of 15%. Sleep / suspend battery time has been solved it seems. A definite step in the right direction. The other major feature is that the GPU can be "turned back on" without a reboot. Simply tell the kernel to rescan the PCIe bus with: echo 1 | sudo tee /sys/bus/pci/rescan

To turn it back off simply reboot. I definitely would like to hear from others whether this works on their system or not. Let me create a separate issue so that we can collect some feedback for Victor. I have created a new issue as promised - https://github.com/bayasdev/envycontrol/issues/157

logicito commented 5 months ago

I thought that almost after a year, this issue would have been resolved, tested it today, and the exact same problem

bayasdev commented 5 months ago

I thought that almost after a year, this issue would have been resolved, tested it today, and the exact same problem

The missing modules is a Fedora issue, it shouldn't try to load again and again once we're blacklisting them

klmcwhirter commented 5 months ago

@bayasdev , actually I am pretty certain it is an intentional design element by the rpmfusion folks - NOT fedora.

The nvidia-fallback.service (that issues that warning) is started from inside of the kernel driver code somewhere. And it is right! During the transition to integrated mode we (envycontrol) just blacklisted all those modules.

So to me, the service is just stating the obvious; the surprise is the choice to (at least temporarily) fallback to nouveau.

As you mentioned, it is completely harmless and can safely be ignored. My analysis of loaded modules across a matrix of scenarios shows that. I am referencing this comment: https://github.com/bayasdev/envycontrol/issues/121#issuecomment-1974187399

Personally I think this knowledge should be captured in the FAQ and this issue closed. But that is just my opinion.

I'll let you work that out with @Boria138 .

Boria138 commented 5 months ago

@bayasdev I guess this error can be closed, as it is not really an error at all