Closed brian-maher closed 3 years ago
This seems like a kernel bug, as the SMI is just a fancy interface for amdgpu's sysfs. What kind of monitor config are you looking at? I know that there are some weird bugs with multi-monitor and mclk depending on the monitor/connector/refresh rate
There isn't any monitor connected to the card.
It's passed through to a VMWare VM, which itself uses the default vmware display adapter for console output.
Thanks, that helps. Can you check dmesg after you set the clocks to low? I am hoping to see if there is an error regarding "Failed to upload..." from the PP table access. If not, is it possible to use the "auto" setting instead of "low" ? If you need "low" to work, I'd suggest raising a ticket with the kernel guys, since this looks like a kernel bug. But first, let's check the dmesg and see if there is an error trying to actually set the perf level to low or not
Sorry for the delay, this appears to be related to a known kernel bug. The 3.7 release would contain the fix for this, so if you can give the 3.7 kernel a shot, that should cover it. If it's still occurring, I think that it would likely get fixed when https://gitlab.freedesktop.org/drm/amd/-/issues/801 gets fixed, but I am hoping that the other DPM cleanup has addressed it (since that bug report is about a laptop)
Closing this as 3.7 resolved this issue. If you have any issues, please open a new issue at https://github.com/RadeonOpenCompute/rocm_smi_lib, as this repo will be deprecated and all SMI CLI functionality has moved over there. Thank you!
Hi,
Bit of a weird one, but fully reproducable on my system. Upon first boot, my mclk sits at level 0 (167mhz).
If I set
--setperflevel high
and do anything with the card, this jumps to level 3. When the card is no longer in use, and i set--setperflevel low
, the sclk goes back to level 0, but the mclk stays at level 3.Despite rocm-smi indicating no extra power usage (it stays at 3w), my UPS shows an additional load of 2% (rated for 700w), so somewhere there is an additional 14w being consumed by the card which is being unreported.
Manually setting the mclk to 0 doesn't work. Manually changing the slck, however, does.
If I repeat the process using auto (e.g. reboot,,
--setperflevel auto
--setperflevel low
, this doesn't seem to occur and the card settles back down to level 0 nicely.The card is a Vega Frontier using ROCm Version: 3.3, rocm-smi installed package is: 1.0.0-199-rocm-rel-3.3-19-ga9d6426
Some further info from the card (you'll notice no load, and low perf mode but high mem clocks):