ROCm / rocm_smi_lib

ROCm SMI LIB
https://rocm.docs.amd.com/projects/rocm_smi_lib/en/latest/
MIT License
112 stars 48 forks source link

issue not resolved in rocm 4.1 #79

Closed ye-luo closed 3 years ago

ye-luo commented 3 years ago

Original issue https://github.com/RadeonOpenCompute/ROC-smi/issues/93 @kentrussell the division by zero issue still shows up after I upgrade to 4.1.

kentrussell commented 3 years ago

Just to check, is the /opt/rocm/bin/rocm-smi symlink pointing to the rocm_smi.py or rocm_smi_deprecated.py? And if you just type in "vi /opt/rocm/bin/rocm-smi" and go to the getFanSpeed function, can you copy/paste that here? Thanks!

ye-luo commented 3 years ago
$ ls /opt/rocm/bin/rocm-smi -l
lrwxrwxrwx 1 root root 11 Mar 17 01:22 /opt/rocm/bin/rocm-smi -> rocm_smi.py

No rocm_smi_deprecated.py found under my rocm-4.1 installation.

def getFanSpeed(device):
    """ Return a tuple with the fan speed (value,%) for a specified device,
    or (None,None) if either current fan speed or max fan speed cannot be
    obtained

    @param device: DRM device identifier
    """
    fanLevel = c_int64()
    fanMax = c_int64()
    sensor_ind = c_uint32(0)
    fl = 0
    fm = 0

    ret = rocmsmi.rsmi_dev_fan_speed_get(device, sensor_ind, byref(fanLevel))
    if rsmi_ret_ok(ret, device):
        fl = fanLevel.value

    ret = rocmsmi.rsmi_dev_fan_speed_max_get(device, sensor_ind, byref(fanMax))
    if rsmi_ret_ok(ret, device):
        fm = fanMax.value

    return (fl, round((float(fl) / float(fm)) * 100, 2))

The fix should be simple. Just put an if statement for fm equal to 0.

kentrussell commented 3 years ago

Apparently they branched 4.1 before the fix was in. That's what we added: if fl == 0 or fm == 0: return (fl, fm) # to prevent division by zero crash

Sorry for the confusion, I was so sure that we had branched after the fix had been merged.

ye-luo commented 3 years ago

Good in 4.2 now