Closed ye-luo closed 3 years ago
Just to check, is the /opt/rocm/bin/rocm-smi symlink pointing to the rocm_smi.py or rocm_smi_deprecated.py? And if you just type in "vi /opt/rocm/bin/rocm-smi" and go to the getFanSpeed function, can you copy/paste that here? Thanks!
$ ls /opt/rocm/bin/rocm-smi -l
lrwxrwxrwx 1 root root 11 Mar 17 01:22 /opt/rocm/bin/rocm-smi -> rocm_smi.py
No rocm_smi_deprecated.py
found under my rocm-4.1 installation.
def getFanSpeed(device):
""" Return a tuple with the fan speed (value,%) for a specified device,
or (None,None) if either current fan speed or max fan speed cannot be
obtained
@param device: DRM device identifier
"""
fanLevel = c_int64()
fanMax = c_int64()
sensor_ind = c_uint32(0)
fl = 0
fm = 0
ret = rocmsmi.rsmi_dev_fan_speed_get(device, sensor_ind, byref(fanLevel))
if rsmi_ret_ok(ret, device):
fl = fanLevel.value
ret = rocmsmi.rsmi_dev_fan_speed_max_get(device, sensor_ind, byref(fanMax))
if rsmi_ret_ok(ret, device):
fm = fanMax.value
return (fl, round((float(fl) / float(fm)) * 100, 2))
The fix should be simple. Just put an if statement for fm equal to 0.
Apparently they branched 4.1 before the fix was in. That's what we added: if fl == 0 or fm == 0: return (fl, fm) # to prevent division by zero crash
Sorry for the confusion, I was so sure that we had branched after the fix had been merged.
Good in 4.2 now
Original issue https://github.com/RadeonOpenCompute/ROC-smi/issues/93 @kentrussell the division by zero issue still shows up after I upgrade to 4.1.