Open Freihut opened 1 month ago
For many devices, RPM might be set to the wrong address and scaled incorrectly. Actually, EC show not RPM, but % of RPM in range 0-150. Someone in the past tried to "normalize" CPU % RPM to 0-100% range and now it returns some wrong values. Fans turned on-off accordingly to curve, with some hysteresis. Fan mode like silent/auto/advanced just limits max available %RPM to some value without any scaling
IDK what is turbine mode
rpm-readings vs percent-readings, isn't my point (I'm aware of that).
(1881/5558)×100=33%, msi_ec shows 43(%). (3900/5558)×100=70%, msi_ec shows broken stuff ('invalid argument')
"turbine mode"=fans@maximum, done by FN+Arrow up
Not all devices have turbine mode, but many have cooler boost
which might be same thing
Where you got 5558
number from?
MSI EC don't calculate percents (except broken CPU %RPM meter, which need to be removed)
Don't look onto CPU RPM reported by driver
Yes, it is the same thing.
5558 = the maximum cpu-fan-rpm (on "turbine mode") for my device, so 100% 1881 = idle, 3900 = cpu-rpm at maximum cpu-load.
The rpm values I've got from my own prog's readings as described in the initial post.
You can assume that boost speed isn't 100% but 150 or 200, plus correlation may be non-linear
No, I won't. msi_ec is reading wrong values for this device (I guess they're target-fan-speeds) and doing wrong math with the wrongly taken cpu-fan-speed (by subtracting and dividing addresses) which can result in undefined behavior (for all devices).
MSI ec did not control fan curve, but I want to fix realtime %rpm readings soon
In the meantime affected people can use my forked repo for this device. Reads rpm values from the correct addresses.
yeah well it's only logical that these addresses are messed up, i was so concentrated on getting shift_mode
to work that i completely forgot about testing cpu/gpu fans speed addresses.
now that i remember correctly, i used ec_sys module readings for fans speeds, and not the actual driver itself.
by the way @Freihut your repo works kinda well, the realtime_fan_speed file in /sys/devices/platform/msi-ec/cpu/
is broken (impossible to open); same with the gpu file except it shows 0 all the time so you might want to check with that too.
now that i remember correctly, i used ec_sys module readings for fans speeds, and not the actual driver itself.
That's fine, as they both should read from the same source.
Letting the device idle and using
watch --interval 1 sudo xxd -g 1 /sys/kernel/debug/ec/ec0/io
(or maybe a smaller interval) while playing around with the turbines "cooler boost" is IMO the best way to find the fan-adresses.
repo works kinda well, the realtime_fan_speed file in
/sys/devices/platform/msi-ec/cpu/
is broken (impossible to open);
That's were my changes are, so it's not "well" at all. :c
The code in my fork only works for the Alpha 17 b5eek (CONF22), because it needs .rt_fan_speed_fallback
in .cpu = {}
and .gpu = {}
to be set. Haven't done this for the other devices, because I can't test this and meight be a device-specific-workaround.
If you're using the same hardware as me, 0xcd and 0xcb in your ec are not matching the fan-speeds.
same with the gpu file except it shows 0 all the time so you might want to check with that too
If /sys/devices/platform/msi-ec/gpu/realtime_fan_speed
reports 0 and you're 100 % sure the GPU-Fan is running (GPU-Temp > 55°C or the coolerboost is on) then it also reads on a wrong address (0xcb) and therefore displays the fallback.
@Freihut before i continue testing the fans speed readings with you, i'd like to confirm a few things in advance:
sudo dmesg | grep error
please do all of these under linux, thanks.
P.S: what you call turbine mode is actually turbo boost.
1. output of `sudo dmesg | grep error`
just a bunch (less than 10) of
ACPI Error: Aborting method \_SB.PCI0.SBRG.EC._Q9A due to previous error (AE_NOT_EXIST) (20240322/psparse-529)
.2. both iGPU and dGPU usage underload (notice anything wrong?)
What is that question for? That's reported by amdgpu (which's just passing firmware readouts) and more or less reasonable. ("More or less" because values reported by the firmware are "meh").
3. idle cpu temperature (after booting and logging in from a cold start)
Around 50°C, depending on room-temp.
4. max power limit reported by nvtop or amdgpu top for the rx6600m
According to amdgpu it is 65w. With Furmark and smartshift enabled I can push the dGPU to around 68w, but /sys/class/drm/card[X]/device/hwmon/hwmon7/power1_cap_max still reports 65w.
5. any bios settings that you changed
My device reports fan-rpm-speeds on 0xcb and 0xcd even for BIOS defaults.
Settings I've changed and can remember: Smartshift, secure boot and modern standby off, UMA for iGPU to 512Mib. But like I wrote: I used these addresses for about 1~2 years now and they never changed and always report plausible speeds. At least for my device.
P.S: what you call turbine mode is actually turbo boost.
Ya, I know, but turbine mode sounds better. :)
BTW, I just made a gui-tool to live view the ec. It highlights changes and does some math to help find fan-speed-addresses. But its pretty alpha right now.
the reason i asked you these questions is that i'm trying to see if the driver is functioning properly before re-checking other addresses, for example: disabling smartshift from bios will prevent the ec from doing any actual performance changes when you change shift mode
in the driver or in the msi dragon center, but will change the fans curves.
disabling modern standby will reset all the power/performance changes after waking up from sleep, you'll have to re apply them by re selecting the performance mode (shift mode
) that you want; if its enabled, you should see an mp2
acpi error that is related to modern standby. thats why i asked you for acpi errors.
i asked you for gpu usage because the vbios has an issue that makes it report 99% on almost any load.
According to amdgpu it is 65w
seems like smartshift doesn't work on linux for some reason.
users of the alpha 15 reported that it works fine, after further searching i found out that the RX6600M vbios is different from the one found on the alpha 17 ; i assume that flashing alpha 15 vbios might fix the issue, but it might brick your laptop.
I just made a gui-tool to live view the ec
just tried it out and its really cool, hopefully it will make it easier for people to test if the driver is working correctly on their laptops or not, thanks for your work.
the reason i asked you these questions [...]
Thanks for explaining.
i asked you for gpu usage because the vbios has an issue that makes it report 99% on almost any load.
I can remember that this occured to me some days ago after standby. But I just tried to reproduce that and both gpus keep reporting sane utilization values. Weird. (No updates happened between these situations).
seems like smartshift doesn't work on linux for some reason.
It kinda does, but in a weird way and it keeps changing as the kernel progresses. 2 years ago smartshift shifted alot to the gpu (if I remember correctly it ran at about ~85w and the cpu dropped to 2,5 Ghz). With the current kernel it shifts about 3w, but very slowly (you can see that the gpus power draw increase over several minutes of load). Any value to the somethingbiassomething-file had no effect.
Smartshift also has some side effects on ryzenadj, but I couldn't figure out what exactly happens there.
just tried it out and its really cool, hopefully it will make it easier for people to test if the driver is working correctly on their laptops or not, thanks for your work.
Thanks for the feedback, I'm glad to help.
I did my testing and @Freihut is right:
.rt_fan_speed_address = 0xcd
for CPU target fan speed address.rt_fan_speed_address = 0xcb
for GPU target fan speed addressValues contained in these 2 addresses are percentages for the target speed, not actual speed in rpms;
the file /sys/devices/platform/msi-ec/cpu/realtime_fan_speed
is unreadable if the target percentage is below 25% or above 55%.
There seems to be a mismatch between the values reported by ec_sys and msi-ec: when target percentage is 25%, msi-ec reports 0%, and when target is 55%, msi-ec reports 100.
so its only possible to load the file if the target is between 25% to 55%.
lets fix things one at a time, correct addresses take priority, @Freihut do you want me to fix it or do you want to make a merge request yourself?
Wait a minute, you can't just fix the addresses, because this needs a rather big overhaul in calculating the fan speeds.
Look at the way I calculate the rpm in my forked code.
But this works only for the Alpha 17 b5eek (and of course devices using the same fans). To fix this for all users you'll need to add the Fallback-rpm for each device currently supported or find the addresses to make msi-ec read that out by itself.
Laptop model
Alpha 17 B5eek
EC firmware version
17LLEMS1.106
Description
Tl;dr cpu-fan-speed: seems incorrect gpu-fan-speed: plausible, but somehow not in "turbine mode"
I've got some weird readings here:
Situation 1: Created some cpu-load while running:
watch --interval 1 cat /sys/devices/platform/msi-ec/cpu/realtime_fan_speed
(combined output of several seconds)
Pluma (a text editor) also throws the "Invalid argument" at the same time, so likely not a cat issue.
Situation 2: Idle + FN + Arrow up (which makes the fans go into "turbine mode") but msi-ec/cpu/realtime_fan_speed reports "43", while msi-ec/gpu/realtime_fan_speed reports "0".
Meanwhile I get the attached output while reading the ec (/sys/kernel/debug/ec/ec0/io) by a small pascal prog I used before.
Line 1 = the dump of the whole ec-line Line 2 = the gpu-rpm-speed Line 3 = the cpu-rpm-speed Interval is 1000ms.
output1.txt idling laptop, just going into "turbine mode" and went back to normal after some seconds. Msi-ec reports "43" for cpu and "0" for gpu all along.
output2.txt laptop has full cpu load. Cpu-fan is around 3900rpm, while gpu-fan is at 0 and gets turned on, when the gpu reached 55°C (as the case gets warmed up I guess). Msi-ec reports "invalid" for cpu all the time and "0" for gpu in the beginning, later it went up to 43, which is kind of plausible.
The pascal prog I was using for around 1 year all the time, so I'm fairly sure the readings are correct, at least they're plausible.
I'm using the latest BIOS E17LLAMS.10B from 2023-06-15 with Arch Linux on Kernel 6.11.0
(the pascal prog src can be compiled with Lazarus; needs to be run as root (to read /sys/kernel/debug/ec/ec0/io) while ec_sys module is running)
output1.txt output2.txt read_ec.tar.gz