Frogging-Family / nvidia-all

Nvidia driver latest to 396 series AIO installer
788 stars 69 forks source link

PRIME is broken with 5.10 #20

Closed phush0 closed 3 years ago

phush0 commented 3 years ago

commit 4d03e3cc59828c82ee89ea6e27a2f3cdf95aaadf breaks Nvidia driver, and card can not be disabled any more dd if="/sys/bus/pci/devices/0000:01:00.0/config" bs=1 count=1 of=/dev/null fix this.

SB-Jr commented 3 years ago

Using the 5.10.7-111-tkg-pds kernel with the Nvidia-all helper, doesnt load the nvidia driver on my optimus laptop

phush0 commented 3 years ago

Driver loads for me, just no power management

HougeLangley commented 3 years ago

Nvidia offload mode working great in my laptop 2021-01-18_20-09

phush0 commented 3 years ago

nvidia-smi shows nothing, because it will keep card active all the time. Turn it off and then: watch 'cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status'

if you have this line or similar in journalctl:

kernel: kernel read not supported for file pci0000:00/0000:00:01.0/0000:01:00.0/config (pid: 1094 comm: nv_queue)

your card will stay always active mine is suspended and total system power is < 9 W (with OLDED display) not like yours 13W only card.

HougeLangley commented 3 years ago

nvidia-smi shows nothing, because it will keep card active all the time. Turn it off and then: watch 'cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status'

if you have this line or similar in journalctl:

kernel: kernel read not supported for file pci0000:00/0000:00:01.0/0000:01:00.0/config (pid: 1094 comm: nv_queue)

your card will stay always active mine is suspended and total system power is < 9 W (with OLDED display) not like yours 13W only card.

Oh, sorry for that. I used lightwork by my NV. if no using the nvidia-smi like this. 2021-01-18_21-36

HougeLangley commented 3 years ago

THAX

SB-Jr commented 3 years ago

Driver loads for me, just no power management

In case of the PDS kernel, the driver doesnt seem to load and nvidia-smi fails for my case. But the same tkg driver works for the latest 5.10.07-Arch-1-1

Kernel:  5.10.7-111-tkg-pds
GPU: GTX 1060
Driver: 460.32.03
SimpliFly03 commented 3 years ago

@SB-Jr What is the output of sudo dmesg | grep -i nvidia and xrandr --listproviders when you are on TKG kernel.

SB-Jr commented 3 years ago

For xrandr --listproviders, I get:

Providers: number : 1
Provider 0: id: 0x49 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 7 associated providers: 0 name:modesetting

For sudo dmesg | grep -i nvidia I get:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-tkg-pds root=UUID=f185188d-121a-4e42-ac8f-f3c71744d3fe rw loglevel=3 quiet fastboot acpi_brightness=vendor pci=noaer resume=/dev/nvme0n1p2 optimus-manager.startup=nvidia
[    0.077750] Kernel command line: intel_pstate=passive BOOT_IMAGE=/boot/vmlinuz-linux-tkg-pds root=UUID=f185188d-121a-4e42-ac8f-f3c71744d3fe rw loglevel=3 quiet fastboot acpi_brightness=vendor pci=noaer resume=/dev/nvme0n1p2 optimus-manager.startup=nvidia
[    5.884657] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input19
[    5.885048] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input20
[    5.885171] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
[    5.885207] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22
SimpliFly03 commented 3 years ago

@SB-Jr Could you run pacman -Q | grep nvidia.

SB-Jr commented 3 years ago

@SB-Jr Could you run pacman -Q | grep nvidia.

I get this output

lib32-nvidia-utils-tkg 460.32.03-146
lib32-opencl-nvidia-tkg 460.32.03-146
nvidia-dkms-tkg 460.32.03-146
nvidia-egl-wayland-tkg 460.32.03-146
nvidia-lts 1:460.32.03-4
nvidia-prime 1.0-4
nvidia-settings-tkg 460.32.03-146
nvidia-utils-tkg 460.32.03-146
opencl-nvidia-tkg 460.32.03-146
phush0 commented 3 years ago

sudo mkinitcpio -P

SB-Jr commented 3 years ago

sudo mkinitcpio -P

I still get this error when trying to see the output for nvidia-smi. Dont seem to see any error in dmesg

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
SimpliFly03 commented 3 years ago

@SB-Jr Uninstall nvidia-lts. If you have nvidia-dmks-tkg or nvidia-dmks it is unnecessary. After that, reinstall nvidia-dkms-tkg and run sudo mkinitcpio -P if it didn't run automatically.

SB-Jr commented 3 years ago

@SB-Jr Uninstall nvidia-lts. If you have nvidia-dmks-tkg or nvidia-dmks it is unnecessary. After that, reinstall nvidia-dkms-tkg and run sudo mkinitcpio -P if it didn't run automatically.

I did as @SimpliFly03 mentioned. But still i am getting the same error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

These are the packages I have currently:

lib32-nvidia-utils-tkg 460.32.03-146
lib32-opencl-nvidia-tkg 460.32.03-146
nvidia-dkms-tkg 460.32.03-146
nvidia-egl-wayland-tkg 460.32.03-146
nvidia-prime 1.0-4
nvidia-settings-tkg 460.32.03-146
nvidia-utils-tkg 460.32.03-146
opencl-nvidia-tkg 460.32.03-146

I do also have optimus-manager as my laptop is an optimus laptop.

dmesg still doesnt show any error.

phush0 commented 3 years ago

if it is with optimus-manager, nvidia-smi to work, you have to be in nvidia mode or hybrid mode...

SB-Jr commented 3 years ago

if it is with optimus-manager, nvidia-smi to work, you have to be in nvidia mode or hybrid mode...

Ok, but the optimus manager current configuration works with my LTS(5.4) kernel and with the latest 5.10.7 kernel already. Plus I have added the kernel parameter for optimus to load in Nvidia mode.

This is my dmesg output:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-tkg-pds root=UUID=f185188d-121a-4e42-ac8f-f3c71744d3fe rw loglevel=3 quiet fastboot acpi_brightness=vendor pci=noaer resume=/dev/nvme0n1p2 optimus-manager.startup=nvidia
[    0.077750] Kernel command line: intel_pstate=passive BOOT_IMAGE=/boot/vmlinuz-linux-tkg-pds root=UUID=f185188d-121a-4e42-ac8f-f3c71744d3fe rw loglevel=3 quiet fastboot acpi_brightness=vendor pci=noaer resume=/dev/nvme0n1p2 optimus-manager.startup=nvidia
[    5.884657] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input19
[    5.885048] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input20
[    5.885171] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
[    5.885207] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input22

If you check the first 2 lines, the kernel is being loaded with the optimus manager mode set to Nvidia.

Is there some other configuration that I need to do to make it work with the tkg kernel?

phush0 commented 3 years ago

Are you sure that you have installed headers of the tkg kernel, because it seems like dkms fail

SimpliFly03 commented 3 years ago

@SB-Jr As @phush0 said this might be header issue, though AFAIK TKG script auto installs it. Just in case could you send the output of pacman -Q | grep headers.

SB-Jr commented 3 years ago

@SimpliFly03 I installed the tkg kernel from chaotic-aur repo. I believe they have mentioned the proper headers in the dependeny section of he PKGBuild file. Btw here is the headers list:

linux-api-headers 5.8-1
linux-headers 5.10.7.arch1-1
opencl-headers 2:2020.12.18-1
vulkan-headers 1:1.2.166-1

I dont see any 'tkg' headers. Is this what @phush0 was referring to?

phush0 commented 3 years ago
ffnvcodec-headers 11.0.10.0-1
linux-api-headers 5.8-1
linux-latest-headers 5.10-1
linux510-headers 5.10.7-3
linux510-tkg-bmq-headers 5.10.8-112
linux59-tkg-bmq-headers 5.9.16-111
opencl-headers 2:2020.12.18-1

mine are. See how according to version and scheduler you have different headers

SB-Jr commented 3 years ago

@phush0, installing the headers separately fixed the issue. I guess the headers are not mentioned as dependency for respective kernels in Chaotic-AUR. Thanks a lot.

HougeLangley commented 3 years ago

Maybe fixed, My power supply connected.

nvidia-smi

phush0 commented 3 years ago

See if there is kernel warning in journalctl for can not read from PCI power config file

phush0 commented 3 years ago

According to source it is still same, so no it is not working

HougeLangley commented 3 years ago

@phush0 I agree with you, for now with some tweaks or waiting for Linux kernel patch to fixes this thing. Nothing can do.

HougeLangley commented 3 years ago

Anyone have patch to fix this bug?

phush0 commented 3 years ago

this is my solution for now

#!/bin/bash

dd if="/sys/bus/pci/devices/0000:01:00.0/config" bs=1 count=1 of=/dev/null

journalctl -f | \
while read line ; do
    echo "$line" | grep "kernel read not supported for file pci0000:00/0000:00:01.0/0000:01:00.0/config"
    if [ $? = 0 ]
    then
           dd if="/sys/bus/pci/devices/0000:01:00.0/config" bs=1 count=1 of=/dev/null
    fi
done
phush0 commented 3 years ago

fixed in 460.39