ROCm / ROC-smi

ROC System Management Interface
https://github.com/RadeonOpenCompute/ROC-smi/blob/master/README.md
178 stars 55 forks source link

`rocm-smi -a` fails with error code 2 #95

Closed misos1 closed 3 years ago

misos1 commented 3 years ago

This never happened with previous versions. Maybe better would be to just ignore info which cannot be queried and throw errors only when the user specifically uses options like --showpagesinfo on the command line?

Maybe error code is returned because of this:

================================== Pages Info ==================================
ERROR: 2 GPU[0]: ras: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
============================ Show Valid sclk Range =============================
ERROR: 2 GPU[0]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[0]      : Unable to display sclk range
ERROR: 2 GPU[1]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[1]      : Unable to display sclk range
================================================================================
============================ Show Valid mclk Range =============================
ERROR: 2 GPU[0]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[0]      : Unable to display mclk range
ERROR: 2 GPU[1]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[1]      : Unable to display mclk range
================================================================================
=========================== Show Valid voltage Range ===========================
ERROR: 2 GPU[0]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[0]      : Unable to display voltage range
ERROR: 2 GPU[1]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[1]      : Unable to display voltage range
================================================================================
============================= Voltage Curve Points =============================
ERROR: 2 GPU[0]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[0]      : Voltage Curve is not supported
ERROR: 2 GPU[1]: od volt: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. 
GPU[1]      : Voltage Curve is not supported
================================================================================
WARNING:         One or more commands failed
============================= End of ROCm SMI Log ==============================
kentrussell commented 3 years ago

This is an issue with the LIB-backed CLI, as rocm-smi is currently using the rocm_smi_lib, as of ROCm 3.8. Please open a new issue at https://github.com/RadeonOpenCompute/rocm_smi_lib, as this repo will be deprecated and all SMI CLI functionality has moved over there. Thank you!

earlruby commented 2 years ago

New issue is tracked as https://github.com/RadeonOpenCompute/rocm_smi_lib/issues/74