BeardOverflow / msi-ec

GNU General Public License v2.0
134 stars 41 forks source link

kernel 6.5 cpupower experiment #76

Open glpnk opened 10 months ago

glpnk commented 10 months ago

Issue with log of @mutchiko experiment from #45 issue comment

hi, can you install kernel 6.5 then cpupower-gui and tell me how many options do you see in Energy preference?

We need this exact kernel because it enables AMD EPP driver on ryzen 3 CPUs, if you see 4 options that means super_battery doesn't control the CPU P-states.

OS: Pop OS Hardware: Modern 14 C5M (14JKEMS1) CPU: AMD Ryzen 5 5625U Kernel: 6.5.0-060500rc7-generic #202308201631 from Ubuntu PPA

Log:

  1. remove in kernel module
    sudo modprobe -r msi-ec
  2. compile and load git version

    requires x86_64-linux-gnu-gcc-13...

    sudo add-apt-repository ppa:ubuntu-toolchain-r/test
    sudo apt install gcc-13
    make
    sudo make install

    Fail: ERROR: could not insert 'msi_ec': Operation not supported

glpnk commented 10 months ago

2.5. Temporary solution for loading git version #71

make
sudo make load-debug

Works until reboot

glpnk commented 10 months ago
  1. Building cpupower-gui for debian/ubuntu/derivatives
    sudo apt install  meson ninja-build pkg-config libglib2.0-bin libglib2.0-dev gettext appstream-util desktop-file-utils
    meson build --prefix /usr -Dsystemddir=/lib/systemd -Duse_libexec=true -Dpkla=true
    ninja -C build
    ninja -C build install
glpnk commented 10 months ago

@mutchiko

how many options do you see in Energy preference?

image

UPD: ~I think module amd-pstate isn't loaded~

UPD2: Changing shifts does nothing

UPD3: Also I do not found EC address for super battery... At least driver works

$ cat /sys/devices/system/cpu/cpufreq/policy0/scaling_driver 
amd-pstate-epp

UPD4: This kernel fixes issue with inability to wake ssd from suspend. Yet another reason to use it. Previously it worked only on Windows or with external ssd. Fixed with this patch

mutchiko commented 10 months ago

@glpnk Thanks! The result here proves that super_battery doesn't change the CPU's P-states.

But we still have the possibility that super_battery controls the voltage and wattage for the CPU, because a scaling governer in linux only controls the frequency of a CPU.

I asked you to test it because you have an efficient processor that might need less power management than a 54W CPU.

Also i noticed that AMD-EPP driver makes my CPU heat up a lot more on idle, do you have the same issue? Because i know that the implementation is still not complete (thanks AMD).

glpnk commented 10 months ago

@mutchiko About heat and power consumption - if this fixes power draw on video decoding it will be great. This laptop had shitty cooling system, it like long thermal pipe along all backplate, so it used for passive cooling. And air exhaust directed to screen, so good cooling occurs on closed lid

mutchiko commented 10 months ago

@glpnk the problem is with the AMDGPU driver not scaling the Vega 8 correctly, it idles at 400mhz and jumps to 2000mhz at any load then returns to idle frequency immediately, when watching a video it will stay at 2000mhz all the time. On intel iris Xe, the driver takes normal steps, like this: 350 -> 470 -> 660 -> 733 ....... (not exact numbers, but you get the idea)

You can check this with corectl package, and you might as well use btop to check how bad the EPP driver is scaling the frequency, just make sure to make it refresh at 100ms, and don't forget to look at temperature jumps too.

glpnk commented 10 months ago

Newly released 6.5 kernel broke more dependencys than it rc7, so module couldn't be compiled and apt is broken. On RC7 temps on video load was around 62C, that was pretty normal before. CPU governor on RC7 was used turbo boost for a significant amount of time, on stable release frequency was lower, but I don't know is code changed at all

glpnk commented 10 months ago

@mutchiko with corectrl I've noticed that frequencies are scaled kinda properly, but I think that GPU is overvolted (1350mv) with frequencies around 600mHz of 1800 maximum (oops 600MHz was on 720p vid, max 1200 on 1080p30fps). Also after testing 6.5 kernel with cpupower-gui turbo boost on kernel 6.4.6 become unavailable (max frequency is 2,3GHz)

Part 2. Using some Crab Rave in 1440p and 4k I've tried to check how related shifts and video. After first shift change max GPU frequency become 1800MHz every \~4 seconds for \~2 seconds, other time freq was 400MHz. Voltage become regulated more smoothly with rare peaks around 1,35V. So on eco, comfort and sport frequencies are the same, but on sport GPU periodically overvolted without reasons. On comfort frequencies are evenly distributed between min and max without overvolt. Temps on 6.4.6 seems lower

UPD: temp in corectrl and btop differs on 10 degrees under load (YT), \~52 for GPU and \~62 for CPU

glpnk commented 10 months ago

After playing around with some Ryzen tuner app on Windows with CPU-Z as benchmark and the stock MSI app, I can say that shifts somehow affects power limits on deeper level than the Ryzen tuning app. Setting any limit or extreme setting in Ryzen tuner results in lower score than with default sport shift. Also setting any shift causes pstates (or some Ryzen limits if I wrong) to reset

mutchiko commented 10 months ago

@glpnk

I've noticed that frequencies are scaled kinda properly

We need to monitor this stuff in a higher refresh rate.

turbo boost on kernel 6.4.6 become unavailable

You don't get turbo boost without the amd p-state driver (now you see how AMD support is late)

Temps on 6.4.6 seems lower

Don't use linux 6.5 for now or you will fry your CPU, it seems that this is related to the preferred cores driver not being implemented yet, it's coming in linux 6.6

temp in corectrl and btop differs on 10 degrees under load

Unfortunately true, but we have to work with what we have for now

glpnk commented 10 months ago

CPU governors are strange and kinda broken.

On 6.5 (from xanmod, same on 6.5rc7 and 6.5 stable) you can set power balance and it will work without overheat, until you play video. It will not use turbo freqs until something is not cranking up voltage I think. At first I thought that MSI used same VRM line for CPU and iGPU both, but SOCs had multiple separate power lines for Hub, CPU cores, iGPU, internal logic etc. Depending of temp it regulated voltage, more voltage => more heat. Max temp was 67-68 which isn't terribe. Frequency range is 400-4400.

On 6.4.6 (stock Pop OS kernel) are more options: (with nominal freq range 1600-2300)

Conclusion: schedutil for CPU intensive apps, one of powersave, conservative, userspace for normal life.

glpnk commented 10 months ago

Yet another try: kernel 6.5 (linux-xanmod-edge-x64v3) flag amd_pstate=passive. Available frequency range 400-4400. Idle temps may vary from previous comment.

Good article covering the topic on phoronix

mutchiko commented 10 months ago

@glpnk please don't put too much effort into testing right now, we have yet to discover what hidden/missing CPU features that are still deactivated and we don't know nothing about, take AMD dynamic CPU Boost for example, and you are wasting your time by adding kernel flags and rebooting, these things should/will be easily configurable from the cpupower utility, just wait for linux 6.6

glpnk commented 10 months ago

@mutchiko It's ok, I'm just tried to find normal working config that:

mutchiko commented 8 months ago

@glpnk hi it's me again, I messed up the BIOS again, but this time my laptop only turns on and nothing appears on the screen, and no matter what i do it just powers on starting heating up, no output whatsoever.

any idea how to fix it? Thanks.

glpnk commented 8 months ago

@mutchiko try to reset BIOS and EC with button on the bottom (if exists) or long pressing the power button.

Your model is MSI Alpha 17 B5EEK?

Or it not helps? Check reaction on caps lock, use external display, disassemble (at you own risk) and remove main and RTC battery for some time.

DON'T TRY TO REMOVE DISPLAY CABLE WITH PLUGGED IN BATTERY it may burn your CPU

mutchiko commented 8 months ago

try to reset BIOS and EC with button on the bottom (if exists) or long pressing the power button.

I've already tried, if it worked out for me i wouldn't be bothering you

Your model is MSI Alpha 17 B5EEK?

Correct

Check reaction on caps lock

Nothing happens when i press it (it doesn't even light up!)

use external display

I tried that too, still nothing

disassemble (at you own risk) and remove main and RTC battery for some time.

I unplugged the battery and waited a few hours, i think it did a bios/EC reset but thats all.

now i don't know anything about RTC battery, did you mean CMOS? Because i don't think that i have one

DON'T TRY TO REMOVE DISPLAY CABLE WITH PLUGGED IN BATTERY it may burn your CPU

Thanks I wouldn't know myself, but why would i unplug the display cable? I know damn well it's the bios and not the screen, i think i was playing with "MMIO limit" or "above 4G DECODING" and i got this issue after rebooting.

But maybe it will help me by forcing the cpu/gpu to route all the graphics to the external screen?

glpnk commented 8 months ago

Disclaimer about battery and display cable was because some "repair techs" forgot to unplug battery and burns some more expensive parts.

Capslock test

Nothing happens when i press it (it doesn't even light up!)

That's bad. I think it was the most simplest test to prove that board is started

Does it reacts to charger?

i don't know anything about RTC battery, did you mean CMOS?

Yes

no matter what i do it just powers on starting heating up

So it reacts on power button? And turns off on long press?

I don't know how and where BIOS store persistent settings, but if it stored on same flash and system is unbootable - you won't be able to reset BIOS with removing battery/long power button press/bios/ec reset button

Do you have way to contact service? Or private repair shops?

UPDATE:

In manual stated that your device has CMOS/RTC battery and battery reset hole

Which reset technique helped you last time?

mutchiko commented 8 months ago

That's bad

Yup

I think it was the most simplest test to prove that board is started

Im sure that the board is started because the fan speeds up after a while

Does it reacts to charger?

Yes the charging led works fine and turns off when it finishes

So it reacts on power button? And turns off on long press?

Exactly

Which reset technique helped you last time?

Long power press, i already tried everything in the book (almost) but my fear is that it's something stored in cpu firmware or something and it's not related to the bios anymore.

Do you have way to contact service? Or private repair shops?

Yeah i was going to contact the seller (who is an official one) hoping that they can take it or tell me where i can get it repaired, but do you think that it can be repaired?

glpnk commented 8 months ago

@mutchiko I think you at least need to flash clean bios but it need to be repacked (because it contains Windows keys, Intel ME or AMD similar solution and other stuff) or find the region with settings and wipe it.

mutchiko commented 8 months ago

But how am I supposed to do that if I can't access the bios?

glpnk commented 8 months ago

At least if seller wont help and you will try to find someone another

mutchiko commented 8 months ago

@glpnk it seems that the last thing i can do is to contact MSI directly, thanks for the help.