BeardOverflow / msi-ec

GNU General Public License v2.0
134 stars 41 forks source link

GF76 11-UC #138

Open Waujito opened 4 days ago

Waujito commented 4 days ago

Laptop model

MSI Katana GF76 11UC

EC firmware version

17L2EMS1.108

EC memory dump

| _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f
-----+------------------------------------------------
0x0_ | 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x2_ | 00 00 00 00 00 00 00 00 0a 05 00 00 00 04 0b 0b
0x3_ | 03 00 00 0d 00 00 50 81 00 00 00 00 00 00 00 00
0x4_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x5_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x6_ | 00 00 00 00 00 00 00 00 2f 00 37 40 49 4c 52 58
0x7_ | 64 26 26 2b 30 36 3c 46 55 64 08 03 03 03 03 03
0x8_ | 00 00 37 3d 43 49 4f 54 63 00 00 2b 30 36 3c 46
0x9_ | 55 64 08 03 03 03 03 02 02 0f 7d 02 0a 78 39 00
0xa_ | 31 37 4c 32 45 4d 53 31 2e 31 30 38 30 34 31 30
0xb_ | 32 30 32 33 31 33 3a 34 34 3a 34 32 00 00 00 28
0xc_ | 00 00 07 22 00 00 00 00 00 d1 00 00 00 00 00 00
0xd_ | 00 00 c1 83 0d 00 05 80 00 01 00 00 00 00 00 00
0xe_ | e2 00 00 00 00 00 00 40 00 00 00 00 00 c0 00 00
0xf_ | 40 00 70 00 00 64 00 00 64 00 00 00 00 00 00 00

GPU

Nvidia

Is your keyboard RGB?

No (single color)

Additional context

I will try to provide support for it with myself but I will probably need in your help. So far I have found: ✔️ Cooler boost ✔️ Webcam toggle ✔️ Webcam block ✔️ Fn <-> Win ✔️ Mic mute LED ✔️ Sound mute LED ✔️ Keyboard backlight intensity

❓ Shift mode ❓ Fan mode

What about shift mode and fan mode is I can't see real difference between it. But when I turn on eco mode my fans were silenced and that seems good. Also I can approve that fan mode somehow affects my fans. When I have written advanced fans settings to EC like it is done in MControlCenter:

const int fan1SpeedSettingStartAddress = 0x72;
const int fan2SpeedSettingStartAddress = 0x8A;
const int fanSpeedSettingsCount = 7;
const int fan1TempSettingStartAddress = 0x6A;
const int fan2TempSettingStartAddress = 0x82;
const int fanTempSettingsCount = fanSpeedSettingsCount - 1;

fan1Temp = {48, 53, 60, 65, 70, 74};
fan1Settings = {0, 43, 60, 75, 85, 100, 100};
fan2Temp = {50, 55, 60, 65, 70, 72};
fan2Settings = {0, 43, 60, 75, 85, 100, 100};

My fans seemed like completly broken: No fans in auto mode and something in advanced mode. But when I had run stress on my cpu fans started to increase rpm.

✔️ CPU Temperature ⭕ GPU Temperature When I run nvidia-smi gpu temperature starts to show for not a long time. Seems like not msi-ec problem, just limitation from MSI.

❓CPU Fan speed cat: /sys/devices/platform/msi-ec/cpu/realtime_fan_speed: Invalid argument Turns out to be 50 in boost mode. ⭕ GPU Fan speed It works but what about format? It is not percent nor rpm. 0 when off, 190 when silent, 78 in boost. It is just a raw data from (0xCD). For cpu it seems like the problem is in rt_fan_speed_base_min_max fractions.

The formula from MControlCenter seems like a workaround:

static ssize_t cpu_realtime_fan_speed_show(struct device *device,
                       struct device_attribute *attr,
                       char *buf)
{
    u8 rdata;
    int result;

    result = ec_read(conf.cpu.rt_fan_speed_address, &rdata);
    if (result < 0)
        return result;

    int val = 0;
    if (rdata > 0) 
        val = 470000/rdata;

    return sysfs_emit(buf, "%i\n", val);
}

It provides fan speed right in rpm, seems ok (2400 in normal mode, 6000 in boost) but I have no idea about correctness.

I will test battery later. Working with EC config and battery on (=> no EC clears) is a dead way :)

glpnk commented 3 days ago

Hi.

My fans seemed like completly broken: No fans in auto mode and something in advanced mode. But when I had run stress on my cpu fans started to increase rpm.

Works as intended because fans are power hungry and drain the battery faster. Fans turns on near 50 degrees on CPU.

❓ Shift mode ❓ Fan mode

Shift - 0xD2 Fan - 0xD4


UPD: shift mode should change CPU power limit, but this not always work as intended


The RPM calculation uses a slightly different constant 470000 480000, and 2 bytes instead of 1. Reading RPM would be supported in LM sensors starting with 6.10 kernel, or might be available now on 6.9.x kernels since the new kernel module is already merged.

CPU/GPU fan speed should be in the range of 0-150%, but currently the code applies normalization for CPU %RPM.

Custom fan curve is not supported yet, you can only enable it by changing the fan mode. If necessary, use MControl Center to edit the fan curve.

For me, on a non-MSI laptop with Nvidia GPU, Windows does not always show the temperature. On Linux, the behavior is probably the same, because the GPU is just turned off for better battery life and can't report its temperature.

You can copy any WMI2 named config and change it for your device. Don't use 0xC8-CF values, because it is coolers RPM.

How many coolers you have in the laptop?

Waujito commented 3 days ago

Works as intended because fans are power hungry and drain the battery faster. Fans turns on near 50 degrees on CPU.

In theory it is, but in fact my computer may be 75 degrees with slow fans. The behaviour with setted up advanced fans in auto mode is really differ from one with no advanced settings (auto mode too). Right now I tried to setup this curve and my fans stopped (but fan_mode is auto).

Reading RPM would be supported in LM sensors starting with 6.10 kernel, or might be available now on 6.9.x kernels since the new kernel module is already merged.

Good news! Do you know about pwmconfig and fancontrol support?

How many coolers you have in the laptop?

2 fans

Shift - 0xD2 Fan - 0xD4

I think I can mark Shift and Fan as ready. No issues with it so far. Works as expected.

Also I have prepared a PR that fixes reversed cpu fan speed. (0x4d on max speed and 0xb0 on min speed (0x00 = no speed))

glpnk commented 3 days ago

Technically, it's not reverted fan speed, but some time per rotation

I'll check your dump for fan curve settings

Waujito commented 3 days ago

Technically, it's not reverted fan speed, but some time per rotation

Sounds logic but what about other versions that are supported? Is it speed there?

I'll check your dump for fan curve settingsSo in other versions it is speed and in mine it is time per rotation?

If you really interested in curve: Without curve:

     | _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f
-----+------------------------------------------------
0x0_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x2_ | 00 00 00 00 00 00 00 00 0a 05 00 00 00 04 0b 0b
0x3_ | 03 00 00 0d 00 00 50 81 00 00 00 00 00 00 00 00
0x4_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x5_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x6_ | 00 00 00 00 00 00 00 00 38 00 37 40 49 4c 52 58
0x7_ | 64 2b 26 2b 30 36 3c 46 55 64 08 03 03 03 03 03
0x8_ | 00 00 37 3d 43 49 4f 54 63 00 00 2b 30 36 3c 46
0x9_ | 55 64 08 03 03 03 03 02 02 0f 7d 02 0a 78 3b 00
0xa_ | 31 37 4c 32 45 4d 53 31 2e 31 30 38 30 34 31 30
0xb_ | 32 30 32 33 31 33 3a 34 34 3a 34 32 00 00 00 28
0xc_ | 00 00 07 00 00 00 00 00 00 d5 00 00 00 00 00 00
0xd_ | 00 00 c1 83 0d 00 05 80 00 01 00 00 00 00 00 00
0xe_ | e2 00 00 00 00 00 00 40 00 00 00 00 00 d1 00 00

With curve:

     | _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f
-----+------------------------------------------------
0x0_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x2_ | 00 00 00 00 00 00 00 00 0a 05 00 00 00 04 0b 0b
0x3_ | 03 00 00 0d 00 00 50 81 00 00 00 00 00 00 00 00
0x4_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x5_ | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x6_ | 00 00 00 00 00 00 00 00 39 00 30 35 3c 41 46 4a
0x7_ | 64 3c 00 2b 3c 4b 55 64 64 64 08 03 03 03 03 03
0x8_ | 00 00 32 37 3c 41 46 48 63 00 00 2b 3c 4b 55 64
0x9_ | 64 64 08 03 03 03 03 02 02 0f 7d 02 0a 78 3d 00
0xa_ | 31 37 4c 32 45 4d 53 31 2e 31 30 38 30 34 31 30
0xb_ | 32 30 32 33 31 33 3a 34 34 3a 34 32 00 00 00 28
0xc_ | 00 00 07 00 00 00 00 00 00 e8 00 00 00 00 00 00
0xd_ | 00 00 c1 83 0d 00 05 80 00 01 00 00 00 00 00 00
0xe_ | e2 00 00 00 00 00 00 40 00 00 00 00 00 d1 00 00

Also in integers which is better for understand of thermal data.

Without curve:

$ od -t u1 -A x /sys/kernel/debug/ec/ec0/io
000000   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000020   0   0   0   0   0   0   0   0  10   5   0   0   0   4  11  11
000030   3   0   0  13   0   0  80 129   0   0   0   0   0   0   0   0
000040   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000060   0   0   0   0   0   0   0   0  55   0  55  64  73  76  82  88
000070 100  43  38  43  48  54  60  70  85 100   8   3   3   3   3   3
000080   0   0  55  61  67  73  79  84  99   0   0  43  48  54  60  70
000090  85 100   8   3   3   3   3   2   2  15 125   2  10 120  59   0
0000a0  49  55  76  50  69  77  83  49  46  49  48  56  48  52  49  48
0000b0  50  48  50  51  49  51  58  52  52  58  52  50   0   0   0  40
0000c0   0   0   7   0   0   0   0   0   0 210   0   0   0   0   0   0
0000d0   0   0 193 131  13   0   5 128   0   1   0   0   0   0   0   0
0000e0 226   0   0   0   0   0   0  64   0   0   0   0   0 209   0   0
0000f0   0   0 112   0   0 100   0   0 100   0   0   0   0   0   0   0

With curve:

$ od -t u1 -A x /sys/kernel/debug/ec/ec0/io
000000   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000020   0   0   0   0   0   0   0   0  10   5   0   0   0   4  11  11
000030   3   0   0   5   0   0  80 129   0   0   0   0   0   0   0   0
000040   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
*
000060   0   0   0   0   0   0   0   0  59   0  48  53  60  65  70  74
000070 100  60   0  43  60  75  85 100 100 100   8   3   3   3   3   3
000080   0   0  50  55  60  65  70  72  99   0   0  43  60  75  85 100
000090 100 100   8   3   3   3   3   2   2  15 125   2  10 120  61   0
0000a0  49  55  76  50  69  77  83  49  46  49  48  56  48  52  49  48
0000b0  50  48  50  51  49  51  58  52  52  58  52  50   0   0   0  40
0000c0   0   0   7   0   0   0   0   0   0   0   0   0   0   0   0   0
0000d0   0   0 193 131  13   0   5 128   0   1   0   0   0   0   0   0
0000e0 226   0   0   0   0   0   0  64   0   0   0   0   0 209   0   0
0000f0   0   0 112   0   0 100   0   0 100   0   0   0   0   0   0   0

As you can see on dump without curve my fan was running and after curve write it has stopped (0xc9) but temperature after write was even larger (I heated it with stress). And fan mode keeps the same (0x0d). I didn't change fan mode at all so in fact fan speed should be the same.

glpnk commented 3 days ago

Is it latest ec/bios update? Because it sounds too terrible. Thanks for dumps

Waujito commented 3 days ago

Is it latest ec/bios update? Because it sounds too terrible. Thanks for dumps

Yes, I updated bios about few weeks ago exactly for this reason

glpnk commented 3 days ago

Really sad. Is on Windows you have same behaviour?

Waujito commented 3 days ago

Really sad. Is on Windows you have same behaviour?

As I remember on windows everything was ok. But i haven't used windows for about a half of year, on new bios too. I can try to install it and play with MSI center but is it possible to get these dumps there?

glpnk commented 3 days ago
glpnk commented 2 days ago

According to dumps, you have not enabled fan curve mode

But for some reason you have changed temperatures of fan curve, and not RPM percents

To activate custom fan curve, you need to set 0xD4 to 0x8D

Waujito commented 2 days ago

According to dumps, you have not enabled fan curve mode

Yes and thats the problem. Fan curve is not enabled but fans speed changed (more scary them are stopped). I tried to specify the curve in vim manually to exclude mistakes in my script and behavior is the same.

Also I installed windows (spent an entire day trying to install it without usb :)) and there are only 6 curve sliders. Starting addresses are ok. May be the 7th is reserved away from user. Also there are no temperature indicators per sliders. Just a heatmap with fixed sliders on it. I made a lot of dumps and will explore them now.

UPD: In last vim test only one (cpu) fan was stopped.

glpnk commented 2 days ago

Try to reset the EC and BIOS by guide from the user manual, then save clean variant of fan curves.

7th slider is probably like last resort before burning CPU/GPU and emergency shutdown

Temperatures may not be intended to change

WDYM as vim test? Literally dumping all memory, patching and writing back? Or changing certain values sequentially?

Waujito commented 2 days ago

Try to reset the EC and BIOS by guide from the user manual, then save clean variant of fan curves.

Yes, I do.

WDYM as vim test? Literally dumping all memory, patching and writing back? Or changing certain values sequentially?

dumping, patching and writing back. xxd filters in vim are OP. But if you forgot to apply it before :w say hi to EC reset :)

So after observing windows logs it turns out that MSI Control simple remembers my thermal settings and replaces it with default curve when I go from advanced to balanced mode. And after some tests in linux I'm sure that it uses curve(or part of it) even despite of "comfort" and "auto".

glpnk commented 2 days ago

You can check is curve applied by making EC dumps under load and comparing temperature and RPM % values with the curve.

MControlCenter set cooler mode to 0x8D in fan curve mode

Or you can set idle RPM % (first slider) to more than 0 value and disable fan curve mode


Hmm, my device really not care about fan mode (custom/auto) when first slider is changed

Silent mode seems to be fake, nice

glpnk commented 2 days ago

@Waujito can you compare fan curves for auto and silent mode on Windows?

You can make comparison easier by saving dump in RWe and loading it in comparison mode. Second click on compare button to switch view from dump to realtime values, and will highlight different values

If you need map of EC values to WMI, with it you can just ignore many addresses image

And if you know ImHex you can check this pattern https://github.com/glpnk/hexpats/blob/main/msi-wmi2-dsdt.hexpat

Waujito commented 2 days ago

You can check is curve applied by making EC dumps under load and comparing temperature and RPM % values with the curve.

I compare sound from my coolers -_- Also it seems like address before the beginning of curve (0x71) indicates which speed is used now from curve. It is not a real value, just an indicator. It works only when curve is enabled.

Or you can set idle RPM % (first slider) to more than 0 value and disable fan curve mode

Do you mean that 0x72 controls something more than just least possible fan speed?

Silent mode seems to be fake, nice

For me it works great (I just tried eco mode in linux) It caps my cpu on 1.1GHz and takes fans control out of curve. 0x71 seems frozen when this mode is enabled.

glpnk commented 2 days ago

Or you can set idle RPM % (first slider) to more than 0 value and disable fan curve mode

Do you mean that 0x72 controls something more than just least possible fan speed?

No. Basically, you can set 0x72 to any non-zero value and read the same value from 0x71 (if CPU is cold enough) or a different value which might equal one of the next addresses 0x72-78.

Few models have Basic fan mode, but it's hard to tell how it should work.

Silent mode seems to be fake, nice

For me it works great (I just tried eco mode in linux) It caps my cpu on 1.1GHz and takes fans control out of curve. 0x71 seems frozen when this mode is enabled.

Re tested, and on Eco shift:

Sounds different.

So, Silent works, but still use fan curve

Also it seems like address before the beginning of curve (0x71) indicates which speed is used now from curve. It is not a real value, just an indicator. It works only when curve is enabled.

True


Different shifts changes CPU power limit on some AMD devices

glpnk commented 2 days ago

Re-re tested and silent works on other shifts too

Waujito commented 2 days ago

Oh eco and silent are not the same thing. Eco seems like a super battery. Silent is a fan mode...

@Waujito can you compare fan curves for auto and silent mode on Windows?

The only thing that changes is 0x0d -> 0x1d in 0xd4. Just like the driver does. And yes I think it sounds different too

glpnk commented 2 days ago

Which app you laptop use? MSI has at least 4 apps + 1 deprecated.

Shift/User scenario use combination of "shift" 0xd2/0xf2 (depends on laptop generation, we call it WMI2/1) and fan settings

Waujito commented 2 days ago

Which app you laptop use?

Shift/User scenario use combination of "shift" 0xd2/0xf2 (depends on laptop generation, we call it WMI2/1) and fan settings

MSI Center from MS Store as it was written on the drivers page. It writes basically to 0xD_ row

glpnk commented 2 days ago

Got AMD STAPM (CPU power limit) values by combo of shift and fan:

eco 10-12 silent + comfort 14 auto + comfort 24 sport 25

Waujito commented 2 days ago

But anyway I cannot understand whats going on in auto mode when I set 0x72 to 00. My cpu fan just full stops independently of temperature. But I also cant say it is a static value. In auto mode fan increases the speed.

Whats the black magic behind that default 26 2b 30 36 3c 46 55 or 38 43 48 54 60 70 85 in decimal... It also doesn't seem to depend only on 0x72...

But in advanced mode everything works as it should

glpnk commented 2 days ago
Temp RPM %
0x69 0x72
0x6A 0x73
... ...
0x70 0x78
glpnk commented 2 days ago

0x69 = 0, 0x6a = 48; so when CPU temp is between this, 0x72 speed value is used

Maybe custom fan curve just breaks auto mode

Waujito commented 2 days ago

Yes, its just broken. 0x72 may act as some kind of fraction. Also temperature may be shifted. I think only we can do is to backup custom curve somewhere (when it will be implemented) and override it with default curve and backwards on every fan mode switch right like it is implemented in MSI Software.

glpnk commented 2 days ago

Fan curve is not implemented yet, another question - is backups of fan curve is task for driver or userspace app

Waujito commented 2 days ago

Hm, If so, Isn't a good option is just to delete the custom fan curve, lock its file (return error on read-write in fan mode auto) and force the user to write it when they want to enable the custom curve?

glpnk commented 2 days ago

No, because other device might have properly implemented fan modes

Waujito commented 1 day ago

So we can reset and lock it for only specific devices and instruct userspace app to check it out after advanced mode enabled. Or we can abstract this out and change curve virtually, store it in memory and push to EC when advanced mode is enabled for devices like mine. Another question is how much devices is affected by this issue and how to detect it?

Btw it seems like my laptop works well now with msi-ec. Battery threshold is ok (but I needed to discharge my battery to 60% for it to start work properly) Leds are likely fine too, but audio-mute one doesn't indicate muted state (not a module issue, related to pipewire I think). CPU fan speed is fixed by PR as well as keyboard lights(It turned off when driver was reloaded/exited, not related to a specific device).

@glpnk Thank you so much for your time. It was amazing to work with you! This project is really cool, thank you!