Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
133 stars 23 forks source link

amdgpu-chk says 'amdgpu' driver doesn't exist, but it's in the kernel #111

Closed Drizzt321 closed 3 years ago

Drizzt321 commented 3 years ago

So I've got an KDE Neon (Ubuntu 20.04 base, more up to date KDE Plasma packages) fresh install on a Ryzen 4800H laptop. I've installed these tools, however it doesn't seem to find the AMD GPU kernel driver that's loaded up.

# amdgpu-chk
Using python 3.8.5
           Python version OK.
Using Linux Kernel 5.4.0-67-generic
           OS kernel OK.
AMD GPU driver is latency=0
           AMD's 'amdgpu' driver package is required.
Error in environment. Exiting...
# lsmod | grep amdgpu
amdgpu               4579328  0
amd_iommu_v2           20480  1 amdgpu
gpu_sched              32768  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
ttm                   106496  1 amdgpu
drm_kms_helper        184320  1 amdgpu
drm                   491520  4 gpu_sched,drm_kms_helper,amdgpu,ttm

I wonder if this is related to an issue I'm having, both with the boot live/install thumb drive, and with the installed system. It will boot, but without SDDM (display manager) showing up with a login prompt after boot using the normal boot. Using the nomodeset kernel option, both the Live USB and installed system boot up to SDDM login, as you'd expect.

Ricks-Lab commented 3 years ago

The version in the official Debian release stopped updating when I changed the name of the repository. Please try the latest by following install directions in the readme.

You will need to purge the current version first.

Drizzt321 commented 3 years ago

@Ricks-Lab Ah, thanks. That's painful, I'll give that a try.

Drizzt321 commented 3 years ago

Ok, got the new installed, the env setup

# gpu-chk
Using python 3.8.5
           Python version OK.
Using Linux Kernel 5.4.0-67-generic
           OS kernel OK.
Using Linux distribution: KDE neon Plasma LTS Edition 5.18
           Distro has not been verified.
amdgpu/rocm version: UNKNOWN
           gpu-utils can still be used.
python3 venv is installed
           python3-venv OK.
rickslab-gpu-utils-env available
           rickslab-gpu-utils-env OK.
In rickslab-gpu-utils-env
           rickslab-gpu-utils-env is activated.

Showing that it's "Unsupported", let me get the amdgpu.ppfeaturemask stuff set and try again.

# gpu-ls
Package addon [clinfo] executable not found.  Use sudo apt-get install clinfo to install
OS Command [clinfo] not found.  Use sudo apt-get install clinfo to install
Detected GPUs: AMD: 1
Can not access package read utility to verify AMD driver.
AMD: Wattman features not enabled: 0xffffbfff, See README file.
1 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: None
   Vendor: AMD
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1636', 'subsystem_device': '0x109f', 'subsystem_vendor': '0x1d05', 'vendor': '0x1002'}
   Decoded Device ID: Renoir
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c6)
   PCIe ID: 04:00.0
   Driver: amdgpu
   GPU Type: Unsupported
   HWmon: None
   Card Path: None
   System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0
Drizzt321 commented 3 years ago

@Ricks-Lab Nope, that didn't help. Still shows

# gpu-ls
Package addon [clinfo] executable not found.  Use sudo apt-get install clinfo to install
OS Command [clinfo] not found.  Use sudo apt-get install clinfo to install
Detected GPUs: AMD: 1
Can not access package read utility to verify AMD driver.
AMD: Wattman features enabled: 0xfffd7fff
1 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: None
   Vendor: AMD
   Readable: False
   Writable: False
   Compute: True
   Device ID: {'device': '0x1636', 'subsystem_device': '0x109f', 'subsystem_vendor': '0x1d05', 'vendor': '0x1002'}
   Decoded Device ID: Renoir
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c6)
   PCIe ID: 04:00.0
   Driver: amdgpu
   GPU Type: Unsupported
   HWmon: None
   Card Path: None
   System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0

Some additional info. The "UNCLAIMED" bit is weird, to me. framebuffer dev TTY seems to be working/setup just fine though.

# uname -a
Linux darklaptop 5.4.0-67-generic #75-Ubuntu SMP Fri Feb 19 18:03:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# cat /proc/cmdline
BOOT_IMAGE=/BOOT/ubuntu_b40fzj@/vmlinuz-5.4.0-67-generic root=ZFS=rpool/ROOT/ubuntu_b40fzj ro fbcon=scrollback:1024k amdgpu.ppfeaturemask=0xfffd7fff
# lshw -C display
  *-display UNCLAIMED
       description: VGA compatible controller
       product: Renoir
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:04:00.0
       version: c6
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi msix vga_controller bus_master cap_list
       configuration: latency=0
       resources: memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:e000(size=256) memory:fe600000-fe67ffff
Ricks-Lab commented 3 years ago

Can you run in debug mode and post the debug log file? gpu-ls --debug

Drizzt321 commented 3 years ago

Had to install clinfo as well. Gist is https://gist.github.com/Drizzt321/55ae9a19928928c997890f0dfe86b16d

# gpu-ls --debug
Neon: Unverified
Detected GPUs: AMD: 1
Can not access package read utility to verify AMD driver.
AMD: Wattman features enabled: 0xfffd7fff
1 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: None
   Vendor: AMD
   Readable: False
   Writable: False
   Compute: False
   Device ID: {'device': '0x1636', 'subsystem_device': '0x109f', 'subsystem_vendor': '0x1d05', 'vendor': '0x1002'}
   Decoded Device ID: Renoir
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c6)
   PCIe ID: 04:00.0
   Driver: amdgpu
   GPU Type: Unsupported
   HWmon: None
   Card Path: None
   System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0
Ricks-Lab commented 3 years ago

The use of a virtual environment is only required for developers. It allows others to recreate the same development environment that I use. For normal use, it is not needed.

Ok, got the new installed, the env setup

# gpu-chk
Using python 3.8.5
           Python version OK.
Using Linux Kernel 5.4.0-67-generic
           OS kernel OK.
Using Linux distribution: KDE neon Plasma LTS Edition 5.18
           Distro has not been verified.
amdgpu/rocm version: UNKNOWN
           gpu-utils can still be used.
python3 venv is installed
           python3-venv OK.
rickslab-gpu-utils-env available
           rickslab-gpu-utils-env OK.
In rickslab-gpu-utils-env
           rickslab-gpu-utils-env is activated.
Drizzt321 commented 3 years ago

What virtual environment? I'm not running any sort of virtualization that I know of.

Ricks-Lab commented 3 years ago

What virtual environment? I'm not running any sort of virtualization that I know of.

From your gpu-chk output, it looks like you setup a venv. Should not cause a problem, just that it is not needed unless you are going to do some development on the project.

The issue in your output that I don't understand is:

   GPU Type: Unsupported
   HWmon: None
   Card Path: None

Hopefully the debug log can provide some insight.

Drizzt321 commented 3 years ago

Under installation from https://github.com/Ricks-Lab/gpu-utils/blob/master/docs/USER_GUIDE.md

Initialize your rickslab-gpu-utils-env if it is your first time to use it. From the project root directory, execute:

python3.6 -m venv rickslab-gpu-utils-env
source rickslab-gpu-utils-env/bin/activate
pip install --no-cache-dir -r requirements-venv.txt
Drizzt321 commented 3 years ago

And @Ricks-Lab, the debug log I gave as a gist above, https://gist.github.com/Drizzt321/55ae9a19928928c997890f0dfe86b16d

Unless there's a debug log created elsewhere than in the current dir.

Ricks-Lab commented 3 years ago

And @Ricks-Lab, the debug log I gave as a gist above, https://gist.github.com/Drizzt321/55ae9a19928928c997890f0dfe86b16d

Unless there's a debug log created elsewhere than in the current dir.

That’s the one. It looks like card and hwmon directories don’t exist. Perhaps kernel driver doesn’t support Renoir yet. I suggest you load from AMD website, but the AMD driver package doesn’t yet work with 20.04.2.

Drizzt321 commented 3 years ago

I would have expected it, at least in somewhat recent versions, to have Renoir support, as per https://www.phoronix.com/scan.php?page=news_item&px=Renoir-DMCUB-AMDGPU-Patches So seems mostly like newer kernels should have better support. I'll have to look into upgrading the kernel, see what kind of effect that has.

Ricks-Lab commented 3 years ago

@Drizzt321 See this posted issue of someone who also using Renoir GPU, but no issue with the driver files. https://github.com/Ricks-Lab/gpu-utils/issues/114

Drizzt321 commented 3 years ago

@Ricks-Lab So....I found that kernel 5.4 doesn't have amdgpu drivers that properly support Renoir. I got the right package to update to 5.8, and now everything is working great. Both SDDM is starting up, and gpu-ls. So in this case, for me, it was entirely a driver issue. Very annoying that the KDE Neon distro default still had 5.4, rather than 5.8 like the rest of the Ubuntu 20.04 based distros have.

# gpu-chk 
Using python 3.8.5
           Python version OK. 
Using Linux Kernel 5.8.0-45-generic
           OS kernel OK. 
Using Linux distribution: KDE neon Plasma LTS Edition 5.18
           Distro has not been verified. 
amdgpu/rocm version: UNKNOWN
           gpu-utils can still be used. 
python3 venv is installed
           python3-venv OK. 
rickslab-gpu-utils-env is NOT available
           rickslab-gpu-utils-env can be configured per User Guide. 
Virtual Environment not configured. Only required by developers.
Not in rickslab-gpu-utils-env (Only needed if you want to duplicate dev env)
          rickslab-gpu-utils-env can be activated per User Guide. 

# gpu-ls
Package addon [clinfo] executable not found.  Use sudo apt-get install clinfo to install
OS Command [clinfo] not found.  Use sudo apt-get install clinfo to install
Detected GPUs: AMD: 1
Can not access package read utility to verify AMD driver.
AMD: Wattman features not enabled: 0xffffbfff, See README file.
Warning: Can not read parameter: mem_loading, disabling for this GPU: 0
Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0
Warning: Can not read parameter: power, disabling for this GPU: 0
Warning: Can not read parameter: power_cap, disabling for this GPU: 0
Warning: Can not read parameter: voltages, disabling for this GPU: 0
1 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0
   Vendor: AMD
   Readable: True
   Writable: False
   Compute: True
   GPU UID: None
   Device ID: {'device': '0x1636', 'subsystem_device': '0x109f', 'subsystem_vendor': '0x1d05', 'vendor': '0x1002'}
   Decoded Device ID: Renoir
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c6)
   Display Card Model: Renoir
   PCIe ID: 04:00.0
      Link Speed: 16.0 GT/s PCIe
      Link Width: 16
   ##################################################
   Driver: amdgpu
   vBIOS Version: 113-RENOIR-026
   Compute Platform: None
   GPU Type: APU
   HWmon: /sys/class/drm/card0/device/hwmon/hwmon4
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0
   ##################################################
   Current Power (W): None
   Power Cap (W): None
   ##################################################
   Current GPU Loading (%): 0
   Current Memory Loading (%): None
   Current GTT Memory Usage (%): 6.736
      Current GTT Memory Used (GB): 0.202
      Total GTT Memory (GB): 3.000
   Current VRAM Usage (%): 78.983
      Current VRAM Used (GB): 0.395
      Total VRAM (GB): 0.500
   Current  Temps (C): {'edge': 38.0}
   Critical Temps (C): {'edge': 0.0}
   Current Voltages (V): None
   Current Clk Frequencies (MHz): {'sclk': 400.0}
   Current SCLK P-State: [1, '400Mhz']
   Current MCLK P-State: [0, '1600Mhz']
   Power Profile Mode: None
   Power DPM Force Performance Level: auto