I have been using the deprecated rocm-smi for a while now to monitor the status of my GPUs. I have a FirePro S10000 (Tahiti), which works with amdgpu, but does not provide, or only provides on a different path, some of the hardware interfaces expected from newer GPUs (for example voltages, clocks, power draw/cap and gpu_busy_percent). This caused the now-deprecated rocm-smi to show a warning about being unable to read gpu_busy_percent, but otherwise it worked.
This new rocm-smi version sadly straight-up fails to deal with this and errors out during initialization.
> /opt/rocm/bin/rocm-smi
rsmi_init() failed
Exception caught: rsmi_init.
ERROR:root:ROCm SMI returned 8 (the expected value is 0)
I have already narrowed this initialization problem down to an attempt to read /sys/class/hwmon/hwmon2/in0_label, which does not exist on monitors of the Tahiti GPUs. This leads to the program to attempt to find "" within kVoltSensorNameMap, which throws an exception (Map::at).
Even without this issue, these GPUs don't provide a frequency table (as far as I know), which causes another exception:
I don't expect rocm-smi to support these old GPUs, but it would be good if it still worked when old GPUs are present. Let me know if you need more information.
Relevant part of lspci:
0a:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)
0b:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)
0b:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ba)
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti PRO GL [FirePro Series]
0c:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti HDMI Audio [Radeon HD 7870 XT / 7950/7970]
0d:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti PRO GL [FirePro Series]
0e:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1)
0f:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
10:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c1)
10:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28
Hardware monitor files of Tahiti:
» ls /sys/class/drm/card0/device/hwmon/hwmon2/
device fan1_input freq1_input freq2_input name pwm1 pwm1_max subsystem temp1_crit_hyst temp1_label
fan1_enable fan1_target freq1_label freq2_label power pwm1_enable pwm1_min temp1_crit temp1_input uevent
I have been using the deprecated rocm-smi for a while now to monitor the status of my GPUs. I have a FirePro S10000 (Tahiti), which works with amdgpu, but does not provide, or only provides on a different path, some of the hardware interfaces expected from newer GPUs (for example voltages, clocks, power draw/cap and gpu_busy_percent). This caused the now-deprecated rocm-smi to show a warning about being unable to read
gpu_busy_percent
, but otherwise it worked.This new rocm-smi version sadly straight-up fails to deal with this and errors out during initialization.
I have already narrowed this initialization problem down to an attempt to read
/sys/class/hwmon/hwmon2/in0_label
, which does not exist on monitors of the Tahiti GPUs. This leads to the program to attempt to find "" withinkVoltSensorNameMap
, which throws an exception (Map::at
).Even without this issue, these GPUs don't provide a frequency table (as far as I know), which causes another exception:
I don't expect rocm-smi to support these old GPUs, but it would be good if it still worked when old GPUs are present. Let me know if you need more information.
Relevant part of lspci:
Hardware monitor files of Tahiti:
Hardware monitor files of Navi21: