Closed sebpuetz closed 5 years ago
It looks like the DPM features aren't being enabled on there. There has been a kernel fix to address it a bit to make things more clear, but it looks like DPM isn't enabled, which is why the sclk/mclk/pclk aren't being displayed (and why the supported clocks are printed, but they aren't listing the current clock). Can you attach your dmesg to see if it's giving any useful insight as to what's going on? Thanks!
Can you try to add amdgpu.ppfeaturemask=0xffffffff to your kernel parameters? (Either edit grub.cfg and add it to the vmlinuz line, or add it to your /etc/default/grub file in the GRUB_CMDLINE_LINUX_DEFAULT string. Then give it a reboot and see if it's there. Vega20 doesn't have all of the PowerPlay features enabled by default, so this might be enough to give it a kick (since dmesg didn't show any failures or anything useful)
Hi,
I tried both places to add the string, but it doesn't seem to work after rebooting. I again attached the dmesg
output as a txt.
rocm-smi -c
======================== ROCm System Management Interface ========================
================================================================================================
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : WARNING: Empty SysFS value: pclk
GPU[0] : Unable to determine current clocks. Check dmesg or GPU temperature
WARNING: One or more commands failed
======================== End of ROCm SMI Log ========================
/etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="amdgpu.ppfeaturemask=0xffffffff"
GRUB_CMDLINE_LINUX=""
# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"
# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console
# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480
# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true
# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"
# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"
Sorry, I sent the message before getting some caffeine in my system. If you update /etc/default/grub, you'll need to do a sudo update-grub to apply the settings. That way it'll end up in your grub.cfg file
Normally the GRUB_CMDLINE_LINUX_DEFAULT should have GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" , so if you removed the quiet splash part to add the amdgpu.ppfeaturemask in, then you should add that back in, and just add the amdgpu.ppfeaturemask=0xffffffff to it, so it looks like: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.ppfeaturemask=0xffffffff"
If you took out quiet/splash before, that's alright. If not, you'll lose your splash screen and see a lot more info during bootup on your console.
No worries, unfortunately
sudo update-grub
reboot
didn't make the clocks show up either. I disabled the splash intentionally, but thanks for the heads up!
I think that we'll have something for this for the next 2.2 release to help to address this. I am trying to pull it in for the next batch of testing. The big issue is that while it will print the clocks, it doesn't explain why DPM appears to be disabled. @fxkamd do you happen to have any insight?
Hi, thanks for looking into this! I just switched to Ubuntu 18.04 from Linux Mint and everything is displayed correctly.
/opt/rocm/bin/rocm-smi
======================== ROCm System Management Interface ========================
================================================================================================
GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf PwrCap SCLK OD MCLK OD GPU%
0 36.0c 20.0W 809Mhz 351Mhz 2.5GT/s, x16 80Mhz21.96% auto 250.0W 0% 0% 0%
================================================================================================
======================== End of ROCm SMI Log ========================
Glad to hear it! I know that we've had some issues with Mint before, so at least things are working properly now! It's not "officially" supported, so I guess there's at least one thing in the kernel that changed that caused DPM to not load. But we've got a workaround (using an "officially supported" OS), so that's good. And I guess that means that ROCm doesn't magically work on Mint right now. Also good to know.
I suspect that the problem is that, when using Linux Mint, @sebpuetz was running kernel 4.20. 4.20 may not have had the Vega 20 DPM code merged in yet.
Closing this since it's resolved using a "supported OS" . And hopefully it works on Mint soon
I'm on Linux Mint (Ubuntu 18.04 in disguise) with kernel 4.20, I installed
rocm-dkms
through apt androcm-smi
doesn't fetch the clocks at all. Are these known limitations or did I mess up during installing? edit: Forgot to mention that I'm on ROCm 2.1rocm-smi
Output:rocm-smi -a
Output:rocminfo
Thanks in advance!