Closed clamcy closed 5 years ago
What kernel and/or driver version are you using that you are seeing this problem?
All of the following versions of the amdgpu drivers surface this mechanism as the sysfs entry power1_cap
:
I am using driver "amdgpu-pro-19.10-785425-ubuntu-18.04.tar.xz". Maybe ROCm driver and amdgpu-pro are two different things? My mistake.
Hi @devclam
It appears that our amdgpu-pro 19.10 drivers also still user power1_cap
, so I'm still a bit confused where this problem is coming from.
Could you show me the output of the following command on your system?
for i in `ls -1 /sys/class/hwmon/`; do echo ${i}; cat /sys/class/hwmon/${i}/device/vendor; cat /sys/class/hwmon/${i}/device/device; ls /sys/class/hwmon/${i}; done
Sorry I am now using this GPU with Windows, so cannot give you detailed output now.
But I can explain why I change power1_cap
to power_cap
.
From the following code:
# rocm-smi.py line 1396, in setPowerOverDrive
power_cap_path = getFilePath(device, 'power1_cap')
and
# line 123 - 130
def getFilePath(device, key):
pathDict = valuePaths[key]
we can see that the program is looking up key
in the valuePaths
dictionary, the key
is 'power1_cap'.
# line 81 - 106
valuePaths = {
#...
'power' : {'prefix' : hwmonprefix, 'filepath' : 'power1_average', 'needsparse' : True},
'power_cap' : {'prefix' : hwmonprefix, 'filepath' : 'power1_cap', 'needsparse' : False},
'power_cap_max' : {'prefix' : hwmonprefix, 'filepath' : 'power1_cap_max', 'needsparse' : False},
'power_cap_min' : {'prefix' : hwmonprefix, 'filepath' : 'power1_cap_min', 'needsparse' : False}
#...
}
The valuePaths
dictionary has only power_cap
key, not power1_cap
key.
Then I changed the code in line 1396 from power1_cap
to power_cap
, and the program seems to be working.
Ah! Thank you for pointing that out. I had not noticed that the mechanism for getting the location of the power1_cap
file had changed between ROCm 2.2 and ROCm 2.3 -- I am still running 2.2 on my systems for various internal testing purposes.
You're absolutely right, we used to just hard-code the power1_cap
location, but the code changed to use a new getFilePath
mechanism that needs the properly key from the valuePaths table. I didn't notice this new mechanism, so I thought your actual file system entry had changed to power_cap
. Thank you for pointing this out.
@kentrussell I think we should merge this patch internally for release at AMD's earliest convenience.
It's internal now, it'll make it for 2.5. Thanks, and sorry guys... sigh
[Error Log] sudo rocm-smi --setpoweroverdrive 125 Traceback (most recent call last): File "/usr/bin/rocm-smi", line 1910, in
setPowerOverDrive(deviceList, args.setpoweroverdrive, args.autorespond)
File "/usr/bin/rocm-smi", line 1396, in setPowerOverDrive
power_cap_path = getFilePath(device, 'power1_cap')
File "/usr/bin/rocm-smi", line 130, in getFilePath
pathDict = valuePaths[key]
KeyError: 'power1_cap'
[FIX]
Line 1396,change
power1_cap
topower_cap
fixed the error.