Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
139 stars 23 forks source link

Tool doesn't see GPUs when PCI address has multiple domains #125

Closed DanaGoyette closed 2 years ago

DanaGoyette commented 2 years ago

I have a Radeon Pro WX 4100 in an ARM64 machine (Honeycomb), where the slots are registered as separate PCIe domains.

The Ubuntu Impish package, as well as the one from your Debian repo, can't find my GPU.

gpu-ls --debug
Ubuntu: Validated
Error [{}]: lspci failed to find GPUs
Detected GPUs: 
No GPUs detected, exiting...

The debug output is this:

DEBUG:gpu-utils:env.set_args:Install type: debian
DEBUG:gpu-utils:env.set_args:Command line arguments:
  Namespace(about=False, short=False, table=False, pstates=False, ppm=False, clinfo=False, no_fan=False, debug=True)
DEBUG:gpu-utils:env.set_args:Local TZ: PDT
DEBUG:gpu-utils:env.set_args:pciid path set to: /usr/share/misc/pci.ids
DEBUG:gpu-utils:env.set_args:Icon path set to: /usr/share/rickslab-gpu-utils/icons
DEBUG:gpu-utils:gpu-ls.main:########## gpu-ls 3.6.1
DEBUG:gpu-utils:env.check_env:Using python: 3.9.7
DEBUG:gpu-utils:env.check_env:Using Linux Kernel: 5.15.28-cex7
DEBUG:gpu-utils:env.check_env:Using Linux Distro: Ubuntu
DEBUG:gpu-utils:env.check_env:Linux Distro Description: Ubuntu 21.10
DEBUG:gpu-utils:env.check_env:Distro: Ubuntu, Ubuntu 21.10
DEBUG:gpu-utils:env.check_env:lspci path: /usr/bin/lspci
DEBUG:gpu-utils:env.check_env:clinfo path: /usr/bin/clinfo
DEBUG:gpu-utils:env.check_env:Ubuntu package query tool: /usr/bin/dpkg
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.42.0, 5.15.28-cex7, LLVM 12.0.1)]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.1 Mesa 21.2.6]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [21.2.6]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.1]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [16]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [256 256 256]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [256]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [3435973836]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:OpenCL map: {None: {'prf_wg_multiple': '64', 'max_wg_size': '256', 'prf_wg_size': None, 'max_wi_sizes': '256 256 256', 'max_wi_dim': '3', 'max_mem_allocation': '3435973836', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '16', 'device_name': 'AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.42.0, 5.15.28-cex7, LLVM 12.0.1)', 'opencl_version': 'OpenCL C 1.1', 'driver_version': '21.2.6', 'device_version': 'OpenCL 1.1 Mesa 21.2.6'}}
DEBUG:gpu-utils:env.read_amdfeaturemask:Raw Featuremask string: [0xfff7bfff]
DEBUG:gpu-utils:env.read_amdfeaturemask:AMD featuremask: 0xfff7bfff
DEBUG:gpu-utils:GPUmodule.get_gpu_pci_list:Found GPU pci: 0004:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]

The PCI bus addresses look like this:

0002:01:00.0 SATA controller [0106]: Samsung Electronics Co Ltd XP941 PCIe SSD [144d:a800] (rev 01)
0004:01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100] [1002:67e3]
0004:01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0]

The pattern for PCI bus IDs seems to look for just bb:dd.f, not xxxx:bb:dd.f.

If I naively edit the PCI_ADD pattern to add the domain section, I get this instead:

Ubuntu: Validated
Detected GPUs: AMD: 1
amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xfff7bfff, See README file.
1 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: None
   Vendor: AMD
   Readable: False
   Writable: False
   Compute: False
   Device ID: {'device': '', 'subsystem_device': '', 'subsystem_vendor': '', 'vendor': ''}
   Decoded Device ID: UNDETERMINED
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
   PCIe ID: 0004:01:00.0
   Driver: amdgpu
   GPU Type: Unsupported
   HWmon: None
   Card Path: None
   System Card Path: None

Debug output:

DEBUG:gpu-utils:GPUmodule.get_gpu_pci_list:Found GPU pci: 0004:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Found 1 GPUs
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item 0c3a16a2054d4dc2aeb4b5dfe535a8cd to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 0004:01:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['0004:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', '\tSubsystem: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', '\tKernel driver in use: amdgpu', '\tKernel modules: amdgpu', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:gpu_name: [Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0004:01/0004:01:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path not set for: 0004:01:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU[0c3a16a2054d4dc2aeb4b5dfe535a8cd] type set to Unsupported
DEBUG:gpu-utils:GPUmodule.set_gpu_list:/sys/device file search found no match to pcie_id: 0004:01:00.0
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '0004:01:00.0', 'model': 'Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', 'vendor': <vendor.AMD: 3>, 'driver': 'amdgpu', 'card_path': '', 'sys_card_path': '', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': False, 'compute_platform': None}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to []
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:HW file does not exist: vendor
DEBUG:gpu-utils:GPUmodule.wattman_status:AMD featuremask: 0xfff7bfff

My /sys/devices/ looks like this:

drwxr-xr-x 6 root root 0 Mar 20 18:22 /sys/devices/pci0002:01
drwxr-xr-x 3 root root 0 Mar 20 18:22 /sys/devices/pci0002:01/NXP0016:00
drwxr-xr-x 2 root root 0 Mar 20 19:07 /sys/devices/pci0002:01/power
drwxr-xr-x 5 root root 0 Mar 20 18:22 /sys/devices/pci0002:01/0002:01:00.0
drwxr-xr-x 3 root root 0 Mar 20 18:22 /sys/devices/pci0002:01/pci_bus
drwxr-xr-x 7 root root 0 Mar 20 18:22 /sys/devices/pci0004:01
drwxr-xr-x 4 root root 0 Mar 20 18:22 /sys/devices/pci0004:01/0004:01:00.1
drwxr-xr-x 2 root root 0 Mar 20 19:07 /sys/devices/pci0004:01/power
drwxr-xr-x 3 root root 0 Mar 20 18:22 /sys/devices/pci0004:01/pci_bus
drwxr-xr-x 13 root root 0 Mar 20 18:22 /sys/devices/pci0004:01/0004:01:00.0
drwxr-xr-x 3 root root 0 Mar 20 18:22 /sys/devices/pci0004:01/NXP0016:02
Ricks-Lab commented 2 years ago

Thanks for raising a bug report. I think your provided enough details to figure it out. I will let you know when I have pushed an update.

Ricks-Lab commented 2 years ago

I pushed an update that should have fixed it but could not test it. Let me know your observations.

DanaGoyette commented 2 years ago

Thanks, now gpu-ls says this (after rebooting to set ppfeaturemask).

Since I haven't really used these tools before, I don't know what to expect, but it makes sense that there's no WattMan: the same is true on Windows, you can't really tune anything.

Detected GPUs: AMD: 1
amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xfff7bfff, See README file.
1 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: None
   Vendor: AMD
   Readable: False
   Writable: False
   Compute: False
   Device ID: {'device': '0x67e3', 'subsystem_device': '0x0b0d', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'}
   Decoded Device ID: Baffin [Radeon Pro WX 4100]
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
   PCIe ID: 0004:01:00.0
   Driver: amdgpu
   GPU Type: Unsupported
   HWmon: None
   Card Path: None
   System Card Path: /sys/devices/pci0004:01/0004:01:00.0

Full debug log:

DEBUG:gpu-utils:env.set_args:Install type: debian
DEBUG:gpu-utils:env.set_args:Command line arguments:
  Namespace(about=False, short=False, table=False, pstates=False, ppm=False, clinfo=False, no_fan=False, debug=True)
DEBUG:gpu-utils:env.set_args:Local TZ: PDT
DEBUG:gpu-utils:env.set_args:pciid path set to: /usr/share/misc/pci.ids
DEBUG:gpu-utils:env.set_args:Icon path set to: /usr/share/rickslab-gpu-utils/icons
DEBUG:gpu-utils:gpu-ls.main:########## gpu-ls 3.6.2
DEBUG:gpu-utils:env.check_env:Using python: 3.9.7
DEBUG:gpu-utils:env.check_env:Using Linux Kernel: 5.15.28-cex7
DEBUG:gpu-utils:env.check_env:Using Linux Distro: Ubuntu
DEBUG:gpu-utils:env.check_env:Linux Distro Description: Ubuntu 21.10
DEBUG:gpu-utils:env.check_env:Distro: Ubuntu, Ubuntu 21.10
DEBUG:gpu-utils:env.check_env:lspci path: /usr/bin/lspci
DEBUG:gpu-utils:env.check_env:clinfo path: /usr/bin/clinfo
DEBUG:gpu-utils:env.check_env:Ubuntu package query tool: /usr/bin/dpkg
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.42.0, 5.15.28-cex7, LLVM 12.0.1)]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.1 Mesa 21.2.6]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [21.2.6]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.1]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [16]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [256 256 256]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [256]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [3435973836]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:OpenCL map: {None: {'prf_wg_multiple': '64', 'max_wg_size': '256', 'prf_wg_size': None, 'max_wi_sizes': '256 256 256', 'max_wi_dim': '3', 'max_mem_allocation': '3435973836', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '16', 'device_name': 'AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.42.0, 5.15.28-cex7, LLVM 12.0.1)', 'opencl_version': 'OpenCL C 1.1', 'driver_version': '21.2.6', 'device_version': 'OpenCL 1.1 Mesa 21.2.6'}}
DEBUG:gpu-utils:env.read_amdfeaturemask:Raw Featuremask string: [0xfff7bfff]
DEBUG:gpu-utils:env.read_amdfeaturemask:AMD featuremask: 0xfff7bfff
DEBUG:gpu-utils:GPUmodule.get_gpu_pci_list:Found GPU pci: 0004:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Found 1 GPUs
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item e8020b1d36c540ccb5aa3eeedb97fe8e to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 0004:01:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['0004:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', '\tSubsystem: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', '\tKernel driver in use: amdgpu', '\tKernel modules: amdgpu', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:gpu_name: [Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0004:01/0004:01:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path not set for: 0004:01:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU[e8020b1d36c540ccb5aa3eeedb97fe8e] type set to Unsupported
DEBUG:gpu-utils:GPUmodule.set_gpu_list:/sys/device file search found match to pcie_id 0004:01:00.0:
['/sys/devices/pci0004:01/0004:01:00.0']
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '0004:01:00.0', 'model': 'Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', 'vendor': <vendor.AMD: 3>, 'driver': 'amdgpu', 'card_path': '', 'sys_card_path': '/sys/devices/pci0004:01/0004:01:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': False, 'compute_platform': None}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/devices/pci0004:01/0004:01:00.0]
DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [['0x1002', '0x67e3', '0x1002', '0x0b0d']], type: [<class 'list'>]
DEBUG:gpu-utils:GPUmodule.wattman_status:AMD featuremask: 0xfff7bfff
Ricks-Lab commented 2 years ago

Maybe there is also a difference in the way your system defines/uses what I am calling the card_path, which typically contains a link to the system device file. Can you check the contents of "/sys/class/drm/"? Also the contents of what gpu-ls is reporting for the "system card path" would also be useful.

AMD gpu's before Fiji are not well supported in the linux drivers, so available capability for Baffin may be limited. But I am motivated to figure out how to deal with card path and device path for this type of installation.

DanaGoyette commented 2 years ago

/sys/class/drm:

lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-1 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-1
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-2 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-2
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-3 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-3
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-4 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-4
lrwxrwxrwx  1 root root    0 Mar 21 17:16 renderD128 -> ../../devices/pci0004:01/0004:01:00.0/drm/renderD128
-r--r--r--  1 root root 4096 Mar 21 17:16 version

/sys/class/hwmon:

lrwxrwxrwx  1 root root 0 Mar 21 17:16 hwmon0 -> ../../devices/virtual/thermal/thermal_zone0/hwmon0
lrwxrwxrwx  1 root root 0 Mar 21 17:16 hwmon1 -> ../../devices/pci0004:01/0004:01:00.0/hwmon/hwmon1

/sys/class/hwmon/hwmon1/:

lrwxrwxrwx 1 root root    0 Mar 21 17:16 device -> ../../../0004:01:00.0
-rw-r--r-- 1 root root 4096 Mar 21 17:45 fan1_enable
-r--r--r-- 1 root root 4096 Mar 21 17:16 fan1_input
-r--r--r-- 1 root root 4096 Mar 21 17:16 fan1_max
-r--r--r-- 1 root root 4096 Mar 21 17:16 fan1_min
-rw-r--r-- 1 root root 4096 Mar 21 17:45 fan1_target
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq1_input
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq1_label
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq2_input
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq2_label
-r--r--r-- 1 root root 4096 Mar 21 17:16 in0_input
-r--r--r-- 1 root root 4096 Mar 21 17:16 in0_label
-r--r--r-- 1 root root 4096 Mar 21 17:16 name
drwxr-xr-x 2 root root    0 Mar 21 17:45 power
-r--r--r-- 1 root root 4096 Mar 21 17:16 power1_average
-rw-r--r-- 1 root root 4096 Mar 21 17:16 power1_cap
-r--r--r-- 1 root root 4096 Mar 21 17:45 power1_cap_default
-r--r--r-- 1 root root 4096 Mar 21 17:45 power1_cap_max
-r--r--r-- 1 root root 4096 Mar 21 17:45 power1_cap_min
-r--r--r-- 1 root root 4096 Mar 21 17:16 power1_label
-rw-r--r-- 1 root root 4096 Mar 21 17:45 pwm1
-rw-r--r-- 1 root root 4096 Mar 21 17:45 pwm1_enable
-r--r--r-- 1 root root 4096 Mar 21 17:45 pwm1_max
-r--r--r-- 1 root root 4096 Mar 21 17:45 pwm1_min
lrwxrwxrwx 1 root root    0 Mar 21 17:16 subsystem -> ../../../../../class/hwmon
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_crit
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_crit_hyst
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_input
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_label
-rw-r--r-- 1 root root 4096 Mar 21 17:16 uevent

Speaking of PCIe domains, the other place I've seen them is on multi-socket boards, but those are a different kind of expensive.

Ricks-Lab commented 2 years ago

Are PCI domains unique to multi-socket boards? My first case of seeing it in this project. It would be cool to have a multi-socket system up and running, but with 64 core single socket system being available, I had not considered the cost of dual socket.

I just pushed a quick update. It adds capability to handle domain in setting card path. Let me know if it works. Once we get this working, It would be best if I refactored this section of code.

DanaGoyette commented 2 years ago

In my ARM board's case, it's not really multi-socket, it just has the PCIe root hidden in firmware because of quirks.

Thanks for the additional fix, now it sees plenty of info. I'll paste the output, but not the (now larger) debug log. Note that at the moment, I'm booted with amdgpu.bapm=0, as an attempt to work around odd hangs.

Ubuntu: Validated
Detected GPUs: AMD: 1
amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xfff7bfff, See README file.
1 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0
   Vendor: AMD
   Readable: True
   Writable: False
   Compute: False
   GPU UID: None
   Device ID: {'device': '0x67e3', 'subsystem_device': '0x0b0d', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'}
   Decoded Device ID: Baffin [Radeon Pro WX 4100]
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
   Display Card Model:  Baffin Pro WX 4100
   PCIe ID: 0004:01:00.0
      Link Speed: 8.0 GT/s PCIe
      Link Width: 8
   ##################################################
   Driver: amdgpu
   vBIOS Version: 113-D0150600-103
   Compute Platform: None
   GPU Type: Modern
   HWmon: /sys/class/drm/card0/device/hwmon/hwmon1
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0004:01/0004:01:00.0
   ##################################################
   Current Power (W): 6.146
   Power Cap (W): 35.000
      Power Cap Range (W): [0, 35]
   Fan Enable: 0
   Fan PWM Mode: [2, 'Dynamic']
   Fan Target Speed (rpm): 2035
   Current Fan Speed (rpm): 2035
   Current Fan PWM (%): 19
      Fan Speed Range (rpm): [1600, 6000]
      Fan PWM Range (%): [0, 100]
   ##################################################
   Current GPU Loading (%): 0
   Current Memory Loading (%): 1
   Current GTT Memory Usage (%): 0.603
      Current GTT Memory Used (GB): 0.024
      Total GTT Memory (GB): 4.000
   Current VRAM Usage (%): 0.895
      Current VRAM Used (GB): 0.036
      Total VRAM (GB): 4.000
   Current  Temps (C): {'edge': 25.0}
   Critical Temps (C): {'edge': 99.0}
   Current Voltages (V): {'vddgfx': 718}
   Current Clk Frequencies (MHz): {'mclk': 300.0, 'sclk': 214.0}
   Current SCLK P-State: [0, '214Mhz']
   Current MCLK P-State: [0, '300Mhz']
   Power Profile Mode: 1-3D_FULL_SCREEN
   Power DPM Force Performance Level: auto
Ricks-Lab commented 2 years ago

Can you check if the file pp_od_clk_voltage exists in the card path directory? Just want to verify if there are other issues in writing to the card. This is the driver file that is written to for under/overclocking the GPU. In older cards, I expect writing is not supported and the file doesn't exist.

Ricks-Lab commented 2 years ago

3.6.3 released with this fix.