Closed davc0n closed 7 months ago
After a quick look at the code I found out the reason of the first warning (ROCm SMI: Failed to get device name):
char name[NVML_DEVICE_NAME_BUFFER_SIZE]; // ROCm SMI does not provide a constant for this as far as I can tell, this should be good enough
result = rsmi_dev_name_get(i, name, NVML_DEVICE_NAME_BUFFER_SIZE);
if (result != RSMI_STATUS_SUCCESS)
Logger::warning("ROCm SMI: Failed to get device name");
The size currently used is 64, and apparently is not enough, result is "RSMI_STATUS_INSUFFICIENT_SIZE". Device name seems properly found using 128.
For the other warnings result is RSMI_STATUS_NOT_SUPPORTED instead (I guess there's nothing we can do here).
EDIT: Issue title has been changed, I was convinced that the cause of the problem was the device not recognized (due to the first warning) but I was wrong.
EDIT#2:
if (gpus_slice[i].supported_functions.gpu_utilization) {
uint32_t utilization;
result = rsmi_dev_busy_percent_get(i, &utilization);
Value of result is always 100, so I guess the issue is related to rsmi. Is there anything we can do?
Some Raven/Picasso/Raven2 APU always report gpu_busy_percent
as 100.
Hello,
I'm using Arch Linux and I did install btop and rocm-smi-lib (6.0.0) from official repositories. Unfortunately GPU monitoring does not work correctly, reported usage value is always 100% which I believe is wrong.
Any help?
Please ask if you need more information.