Closed Ricks-Lab closed 4 years ago
I will work updating the User Guide. I've a question about the listing of p-states from my Navi 10 card (below). It is similar to what is currently in the Guide for your Vega 20 card, but i just recently noticed it. With amdgpu-ls --pstates, there are two sets of frequencies for sclk and mclk curve endpoints. i don't clearly understand what the two sets represent. In amdgpu-monitor, the highest SCLK p-state I see wilth the card under load is '2', which seems to correspond to the '2' in the first set of --pstates and the '2' in the SCLK mask of amdgpu-pac. Everything fine there. Yet the highest MCLK p-state of '3' that I see in amdgpu-monitor, which also shows in the MCLK mask of amdgpu-pac, does not correspond with anything in amdgpu-ls --pstates. How should these various p-states for Type 2 cards be explained in the User Guide?
$ ./amdgpu-ls --pstates
Detected GPUs: INTEL: 1, AMD: 1
AMD: amdgpu version: 20.10-1048554
AMD: Wattman features enabled: 0xfffd7fff
2 total GPUs, 1 rw, 0 r-only, 0 w-only
Card Number: 1
Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev ca)
Card Path: /sys/class/drm/card1/device
GPU Frequency/Voltage Control Type: 2
SCLK: MCLK:
0: 300Mhz 0: 100Mhz
1: 1040Mhz 1: 500Mhz
2: 1780Mhz 2: 625Mhz
SCLK: MCLK:
0: 800Mhz -
1: 1780Mhz - 1: 875MHz -
VDDC_CURVE:
0: ['800MHz', '707mV']
1: ['1290MHz', '750mV']
2: ['1780MHz', '959mV']
BTW, I have been able to overclock and underclock the endpoints and undervolt the curve.
I have modified the format of amdgpu-ls --pstate
to be more clear.
For type 2 cards, there are no pstates defined in the pp_od_clk_voltage
file, so I show the pstates from the pp_dpm_[sm]clk
files. Type 2 cards do not define the curve with pstates, but instead use AVFS on a curve defined by the 3 Freq/Voltage curve points.
To overclock, I assume you would not need to change the curve, but just define an operating point at a higher Frequency then the stock highest. This may be limited by the OD_Range points.
Good, got it. When might it be useful to change the curve? (I edited a typo in my previous comment from "I have been about to overclock..." to "I have been able to overclock...) Yes, I have overclocked by raising the sclk or mclk OD curve endpoints and have undervolted by lowering the mV of the 3rd vddc curve point. I haven't tried altering that curve in any other way.
I have always been working to manage power, so I don't have much experience overclocking, though I have tried it with older cards in some benchmarking I was doing. The curve is what defines how AVFS works. The GPU is meant to operate on that curve. Perhaps the curve doesn't represent operating points beyond the curve accurately, so maybe redefining an end point might make sense. Perhaps it is a good idea to plot the curve in excel and see how any modified curve would compare. Another use could be instability for an aged card. Maybe you get get more life out of it by shifting the whole curve by a voltage offset.
@csecht Please pull the latest from master. I made some code optimizations (pre-compile regex and optimize some string searches). I also made a minor change to pac interface.
@csecht I have merged your pull request. Looks good!
A couple of minor observations:
Have you been able to test the latest on master on your systems? On my systems, it is more responsive with the optimizations.
Okay, I’ll update the plot and pac Type1 examples tomorrow. Yes, I did notice that the monitor GUI launches very quickly. Nice. I’ll test the other modules tomorrow.
On Jun 3, 2020, at 6:52 PM, Rick notifications@github.com wrote:
@csecht https://github.com/csecht I have merged your pull request. Looks good!
A couple of minor observations:
Applicable version should be 3.2.x The plot example is from old version. There are minor format changes in the latest. The pac example for Type 1 cards is not the latest. Have you been able to test the latest on master on your systems? On my systems, it is more responsive with the optimizations.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/76#issuecomment-638519709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALMVCQSLLDY7ZERFI5RDIULRU3O2XANCNFSM4NQVTHMA.
I implemented another optimization by using an Enum object in the definition of sensors instead of using names which should be slightly faster. It was a major change, so a thorough review of amdgpu-ls
parameters would be a good idea.
It all looks good. Nice and responsive too.
@csecht
I merged your pull request. Looks good!
I plan to make the release tomorrow.
I got this error this morning with PAC whenever I try to change any parameter:
$ ./amdgpu-pac --execute
Detected GPUs: INTEL: 1, AMD: 1
AMD: amdgpu version: 20.10-1048554
AMD: Wattman features enabled: 0xfffd7fff
2 total GPUs, 1 rw, 0 r-only, 0 w-only
# Write Delta mode.
Traceback (most recent call last):
File "./amdgpu-pac", line 838, in save_card
old_pwm = int(v.get_params_value('fan_pwm')) if v.get_params_value('fan_pwm').isnumeric() else None
AttributeError: 'float' object has no attribute 'isnumeric'
At which point it just hangs and I don't get a prompt to enter my sudo password. Yesterday I installed then ununstalled rocm and reinstalled AMDGPU for OpenCL. Not sure whether it's a change with amdgpu-utils or something amiss in the driver package. The RX 5600 XT card is crunching fine at its default values, however.
EDIT: I was able to successfully run a startup PAC BASH script, as a service, to change the sclk endpoint for that card, so the device files can be edited.
@csecht I think I fixed the problem. It looks like my original fix for this is what caused the writing of 0 to the fan. Let me know if it works now.
I just pushed a more robust approach.
Yes, that fixed it.
Still not happy with the robustness of the solution, so I will delay official release for a week. I did enhance critical temp reading and display value for all sensors in amdgpu-ls
and implemented a more generic read of voltages which will work if multiple voltage sensors are available.
I think I have a more robust solution for dealing with variable types of numeric values in pac and monitor. While working on this, I implemented Enum for GPU Types and Vendors. This makes it so I no longer use numeric type indicators and use enumerated names instead. Perhaps the Users Guide needs to be updated with these new type names:
GPU_Type = GpuEnum('type', 'Undefined PStatesNE PStates CurvePts')
Probably only the the last two are relevant to the user.
I edited the User Guide accordingly and issued Pull Request. "Type 0" was replaced with Type Undefined, etc. Type PStatesNE was not introduced in the guide.
Actually, Type0 was used for and older GPU that had non-editable p-states, but the re-write in 3.x seems to have eliminated that classification. I have one old card. Maybe I will work with that to re-implement the classification of PStatesNE type.
Would Undefined be used as an else
condition for unforeseen cards that don't match any known state?
What about the HD series of cards that some users might still have?
Yes, Undefined is the default type. It gets set to PStates or CurvePts when the pstates are read from the pp_od_clk_voltage
file.
It looks like the code I had to set the Type for HD series is missing after the rewrite. I need to put an old card back in and work it out again with the new code base.
I have made some user guide modifications, so be sure to pull the latest if you are going to make some edits.
I have implemented a few more Enum objects and made a major change to how sensors are read. It should be much more efficient now. I think that was the last major change for release 3.2. I will release this weekend, so let me know if you see any issues.
It looks like I gave away the R9 290x card I had, so I installed an older HD 7870 GPU. It had only a few parameters available, but I am not sure if this is due to not having amdgpu installed. I am using Ubuntu 20.04, and there is no amdgpu install package for it yet. Anyway, here is what I get with amdgpu-ls
:
Card Number: 0
Vendor: AMD
Readable: True
Writable: False
Compute: False
GPU UID:
Device ID: {'vendor': '0x1002', 'device': '0x6818', 'subsystem_vendor': '0x1462', 'subsystem_device': '0x2740'}
Decoded Device ID: Pitcairn XT [Radeon HD 7870 GHz Edition]
Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition]
Display Card Model: Pitcairn XT [Radeon HD 7870 GHz Edition]
PCIe ID: 08:00.0
Link Speed: 8 GT/s
Link Width: 16
##################################################
Driver: radeon, amdgpu
Compute Platform: None
GPU Frequency/Voltage Control Type: Legacy
HWmon: /sys/class/drm/card0/device/hwmon/hwmon4
Card Path: /sys/class/drm/card0/device
##################################################
Fan PWM Mode: [2, 'Dynamic']
Current Fan PWM (%): 28
Fan PWM Range (%): [0, 100]
##################################################
Current Temps (C): {'unnamed': 28.0}
Critical Temps (C): {'unnamed': 120.0}
Power DPM Force Performance Level: auto
Also, I am now reading the device id details and decoding from pciid file for the non-readable onboard GPUs. Here is what I get for my server system:
Card Number: 0
Vendor: ASPEED
Readable: False
Writable: False
Compute: False
Device ID: {'vendor': '0x1a03', 'device': '0x2000', 'subsystem_vendor': '0x1458', 'subsystem_device': '0x1000'}
Decoded Device ID: ASPEED Graphics Family
Card Model: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
PCIe ID: c4:00.0
Driver: ast
HWmon: None
Card Path: /sys/class/drm/card0/device
I ran into a problem where the missing values in Legacy cards causes problems with monitor and plot, so I exclude them when getting a list of readable GPUs.
So, for the User Guide, what is the difference between PStatesNE and Legacy cards?
A minor point in formatting output from amdgpu-ls:
Current Temps (C): {'mem': 88.0, 'edge': 63.0, 'junction': 69.0}
Critical Temps (C): {'mem': 99.0, 'junction': 99.0, 'edge': 118.0}
For Current Temps, the order of 'edge' and 'junction' ought to be switched, to match the order in Critical Temps (or visa versa).
I am concerned that the observations for HD 7870 are very different from what I observed for R9 290x. Not sure if it is a real difference, or an artifact of not having amdgpu driver package installed on my 20.04 system. Let's hold off documenting Legacy and PStatesNE until I get more clarity.
A minor point in formatting output from amdgpu-ls:
Current Temps (C): {'mem': 88.0, 'edge': 63.0, 'junction': 69.0} Critical Temps (C): {'mem': 99.0, 'junction': 99.0, 'edge': 118.0}
For Current Temps, the order of 'edge' and 'junction' ought to be switched, to match the order in Critical Temps (or visa versa).
Implemented sorting of dictionaries for print in the latest on master.
I just remembered I had a Radeon HD 4650, so I installed it in my machine with Ubuntu 18.04, kernel 5.3.0, and amdgpu version 20.10-1048554, then ran amdgpu-ls
from the most recent Master, and got this:
Traceback (most recent call last):
File "./amdgpu-ls", line 147, in <module>
main()
File "./amdgpu-ls", line 94, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 1379, in set_gpu_list
hw_file_srch = glob.glob(os.path.join(card_path, env.GUT_CONST.hwmon_sub) + '?')
File "/usr/lib/python3.6/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Get a similar error with all other amdgpu-utils commands, except amdgpu-chk
.
Here is information from lspci:
$ lspci -k -nn -s 01:00.0
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RV730 PRO [Radeon HD 4650] [1002:9498]
Subsystem: PC Partner Limited / Sapphire Technology RV730 PRO [Radeon HD 4650] [174b:9498]
Kernel modules: radeon
It looks like card_path is not set. I made some changed to deal with it by setting Type to a new type, Unsupported. Could you provide debug output so that I can make sure the solution is robust?
$ ./amdgpu-ls --debug debug_gpu-utils_20200610-192949.log
Is that log from the latest on master? I added a few more log statements in the latest.
sorry. Here is the terminal stdout
$ ./amdgpu-ls --debug
Ubuntu: Validated
Traceback (most recent call last):
File "./amdgpu-ls", line 147, in <module>
main()
File "./amdgpu-ls", line 94, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 1418, in set_gpu_list
'compute_platform': opencl_device_version})
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 607, in populate_prm_from_dict
source_value.replace('{}card'.format(env.GUT_CONST.card_root), '').replace('/device', ''))
AttributeError: 'NoneType' object has no attribute 'replace'
and here is the debug file: debug_gpu-utils_20200610-193528.log
I think I have covered the other places where card_path is referenced. Let me know when you get a chance to try it out.
Got it. Here is teminal
$ ./amdgpu-ls --debug
Ubuntu: Validated
Traceback (most recent call last):
File "./amdgpu-ls", line 147, in <module>
main()
File "./amdgpu-ls", line 94, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 1424, in set_gpu_list
rdata = self[gpu_uuid].read_gpu_sensor('id', vendor=GpuItem.GPU_Vendor.AMD, sensor_type='DEVICE')
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 954, in read_gpu_sensor
file_path = os.path.join(sensor_path, sensor_file)
File "/usr/lib/python3.6/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
and the debug debug_gpu-utils_20200610-195255.log
Looks like readable flag was still True for unsupported GPUs. I fixed that.
Hmmmm. The terminal:
$ ./amdgpu-ls --debug
Ubuntu: Validated
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit
msg = self.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 840, in format
return fmt.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 577, in format
record.message = record.getMessage()
File "/usr/lib/python3.6/logging/__init__.py", line 338, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "./amdgpu-ls", line 147, in <module>
main()
File "./amdgpu-ls", line 94, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 1384, in set_gpu_list
logger.debug('GPU[{}] type set to Unsupported', gpu_uuid)
Message: 'GPU[{}] type set to Unsupported'
Arguments: ('583a7958fb3742a492abed0a9f430573',)
Traceback (most recent call last):
File "./amdgpu-ls", line 147, in <module>
main()
File "./amdgpu-ls", line 94, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 1427, in set_gpu_list
rdata = self[gpu_uuid].read_gpu_sensor('id', vendor=GpuItem.GPU_Vendor.AMD, sensor_type='DEVICE')
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 954, in read_gpu_sensor
file_path = os.path.join(sensor_path, sensor_file)
File "/usr/lib/python3.6/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
and debug: debug_gpu-utils_20200610-203338.log
Oops... Used wrong string format in logger. Fixed and pushed.
not quite...
$ ./amdgpu-ls --debug
Ubuntu: Validated
Traceback (most recent call last):
File "./amdgpu-ls", line 147, in <module>
main()
File "./amdgpu-ls", line 94, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 1428, in set_gpu_list
rdata = self[gpu_uuid].read_gpu_sensor('id', vendor=GpuItem.GPU_Vendor.AMD, sensor_type='DEVICE')
File "/home/craig/amdgpu-utils-master/GPUmodules/GPUmodule.py", line 954, in read_gpu_sensor
file_path = os.path.join(sensor_path, sensor_file)
File "/usr/lib/python3.6/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
It looks like the readable flag is still True. Not sure why, so I have added more logger statements.
The debug says it's looking in the path /sys/devices/, but the only thing there is the CPU. Shouldn't it look in /sys/class/drm/ where the GPUs are? The HD 4650 is in the first PCI slot, so ...
$ ls /sys/class/drm/card0/device
ari_enabled current_link_width enable irq msi_bus resource subsystem
boot_vga d3cold_allowed firmware_node label msi_irqs resource0 subsystem_device
broken_parity_status device graphics local_cpulist numa_node resource2 subsystem_vendor
class dma_mask_bits i2c-0 local_cpus power resource2_wc uevent
config driver i2c-1 max_link_speed remove resource4 vendor
consistent_dma_mask_bits driver_override i2c-2 max_link_width rescan revision
current_link_speed drm index modalias reset rom
In the /sys /devices directory:
$ ls /sys/devices/
breakpoint cstate_pkg isa msr pnp0 system uncore_cbox_0 uprobe
cpu i915 kprobe pci0000:00 power tracepoint uncore_cbox_1 virtual
cstate_core intel_pt LNXSYSTM:00 platform software uncore_arb uncore_imc
The way to associate the correct card path is by looking for the full system device path with the pcie id in the pathname. This version of the pathname is derived from the typical card_path name using resolve.
So I check full system path of each potential card path for a match to the pcie_id. If a match is found, then that card path is associated with the pcie_id. For this card, no match is found.
From the log file:
This card has pcie_id of: 01:00.0
[01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV730 PRO [Radeon HD 4650]
There are 2 potential card paths: 0 & 1
/sys/class/drm/card1/device = /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/0000:03:00.0/0000:04:00.0
/sys/class/drm/card0/device = /sys/devices/pci0000:00/0000:00:02.0
Neither matches the pcie_id of 01:00.0
so, GPU type set to Unsupported
Even if we find that there is a valid card path, still need to fix the issue where an unsupported card is interpreted as readable. Let's get this one fixed first, then work on a potential issue of matching a pcie_id to a card path.
To make the card path details more clear, I have added the system card path to the output of amdgpu-ls
. I have also implemented the amdgpu-ls --short
option to give a brief report of basic GPU properties.
I have discovered an inconsistency in the way I was accessing the list of GPU's. Maybe this was the source of unreadable cards being read. But the real problem was that I was only checking readability flag in GpuList.read_gpu_sensor_data
and not in GpuItem.read_gpu_sensor_data
.
Yes! it's working now to deal with the unsupported card:
$ ./amdgpu-ls --short
Detected GPUs: INTEL: 1, AMD: 2
AMD: amdgpu version: 20.10-1048554
AMD: Wattman features enabled: 0xfffd7fff
3 total GPUs, 1 rw, 0 r-only, 0 w-only
Card Number: 0
Vendor: INTEL
Readable: False
Writable: False
Compute: False
Device ID: {'device': '0x3e91', 'subsystem_device': '0x8694', 'subsystem_vendor': '0x1043', 'vendor': '0x8086'}
PCIe ID: 00:02.0
HWmon: None
Card Path: /sys/class/drm/card0/device
System Card Path: /sys/devices/pci0000:00/0000:00:02.0
Card Number:
Vendor: AMD
Readable: False
Writable: False
Compute: False
Device ID: {'device': '', 'subsystem_device': '', 'subsystem_vendor': '', 'vendor': ''}
PCIe ID: 01:00.0
HWmon: None
Card Path: None
System Card Path: None
Card Number: 1
Vendor: AMD
Readable: True
Writable: True
Compute: True
Device ID: {'device': '0x731f', 'subsystem_device': '0xe411', 'subsystem_vendor': '0x1da2', 'vendor': '0x1002'}
Display Card Model: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
PCIe ID: 04:00.0
HWmon: /sys/class/drm/card1/device/hwmon/hwmon3
Card Path: /sys/class/drm/card1/device
System Card Path: /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/0000:03:00.0/0000:04:00.0
Now, about that card path...
While amdgpu-ls
lists the unsupported card's name it does't list the card path or pci-ids.
The undefined card, however, does have a path and its vendor and device pci-ids are listed with $ lspci -k -nn
(as previously commented). Example:
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RV730 PRO [Radeon HD 4650] [1002:9498]
Subsystem: PC Partner Limited / Sapphire Technology RV730 PRO [Radeon HD 4650] [174b:9498]
Kernel modules: radeon
A grep for those pci-ids shows that card's path is /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/ and pci-id data is in there. Examples:
$ cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/uevent
PCI_CLASS=30000
PCI_ID=1002:9498
PCI_SUBSYS_ID=174B:9498
PCI_SLOT_NAME=0000:01:00.0
MODALIAS=pci:v00001002d00009498sv0000174Bsd00009498bc03sc00i00
And...
$ cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/subsystem_vendor
0x174b
$ cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/subsystem_device
0x9498
$ ls /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/
ari_enabled current_link_speed enable max_link_width rescan resource4 uevent
boot_vga current_link_width firmware_node modalias reset revision vendor
broken_parity_status d3cold_allowed irq msi_bus resource rom
class device local_cpulist numa_node resource0 subsystem
config dma_mask_bits local_cpus power resource0_wc subsystem_device
consistent_dma_mask_bits driver_override max_link_speed remove resource2 subsystem_vendor
A minor point of formatting amdgpu-ls --help
stdout:
$ ./amdgpu-ls --help
usage: amdgpu-ls [-h] [--about] [--short] [--table] [--pstates] [--ppm]
[--clinfo] [--no_fan] [-d]
optional arguments:
-h, --help show this help message and exit
--about README
--short Short listing basic GPU details
--table Output table of basic GPU details
--pstates Output pstate tables instead of GPU details
--ppm Output power/performance mode tables instead of GPU details
--clinfo Include openCL with card details
--no_fan do not include fan setting options
-d, --debug Debug output
To match the terminal output of the --table
option should instead read,
--table Current status of readable GPUs
I am going to need to think about how to deal with cards that don't have a normal card_path. I am currently only examining the system path of card paths that exist. I will work on it over the weekend.
Hope you don't mind, but I have made significant changes across all modules to deal with the issue causing confusion in the way I access gpu's in a GPU List. The code is now much more intuitive. I have only tried on one of my systems, but it is getting late here. I will push to master. Let me know if you find any issues. It also includes the help format change.
I ran through all the commands and everything is working. Nice.
The amdgpu-ls
output is clear regarding how many GPUs are detected and which can be modified:
$ ./amdgpu-ls
Detected GPUs: INTEL: 1, AMD: 2
AMD: amdgpu version: 20.10-1048554
AMD: Wattman features enabled: 0xfffd7fff
3 total GPUs, 1 rw, 0 r-only, 0 w-only
Card Number: 0
Vendor: INTEL
Readable: False
Writable: False
Compute: False
Device ID: {'device': '0x3e91', 'subsystem_device': '0x8694', 'subsystem_vendor': '0x1043', 'vendor': '0x8086'}
Decoded Device ID: 8th Gen Core Processor Gaussian Mixture Model
Card Model: Intel Corporation 8th Gen Core Processor Gaussian Mixture Model
PCIe ID: 00:02.0
Driver: i915
HWmon: None
Card Path: /sys/class/drm/card0/device
System Card Path: /sys/devices/pci0000:00/0000:00:02.0
Card Number:
Vendor: AMD
Readable: False
Writable: False
Compute: False
Device ID: {'device': '', 'subsystem_device': '', 'subsystem_vendor': '', 'vendor': ''}
Decoded Device ID: UNDETERMINED
Card Model: Advanced Micro Devices, Inc. [AMD/ATI] RV730 PRO [Radeon HD 4650]
PCIe ID: 01:00.0
Driver: radeon
HWmon: None
Card Path: None
System Card Path: None
Card Number: 1
Vendor: AMD
Readable: True
>and so on...
Still researching how to get the /sys/devices path for a specific pcie ID. My first attempt is this code:
sys_pci_dirs = glob.iglob('/sys/devices/pci*:*/**/*:{}'.format(pcie_id), recursive=True)
But it maxes out cpu for a long time and hasn't returned anything useful yet. Still need to do some research.
Maybe the use of a naked '*' is too greedy. Would a more explicit regex work?
sys_pci_dirs = glob.iglob('/sys/devices/pci\d*:\d*/\d*:{}'.format(pcie_id), recursive=True)
or this; is more general, but uses dot. to give something to work on and '?' removes the greediness of
sys_pci_dirs = glob.iglob('/sys/devices/pci.*:.*?/.*?:{}'.format(pcie_id), recursive=True)
I tested these regex out on https://pythex.org/ and both seem to work for matching up to the pcie-id.
I have prepared v3.2.0 Release Candidate 1 on master. I have tested on my 3 systems. Looks good so far. Please provide your experience here as verification/feedback before release planned for this coming weekend. Thanks!