erpalma / throttled

Workaround for Intel throttling issues in Linux.
MIT License
2.61k stars 158 forks source link

Write to unrecognized MSR on Linux 5.9 #215

Open cedws opened 3 years ago

cedws commented 3 years ago

I recently updated to kernel 5.9 and started seeing the kernel log this, seems to be caused by throttled:

[  140.603433] msr: Write to unrecognized MSR 0x1a2 by python3
               Please report to x86@kernel.org
[  140.603516] msr: Write to unrecognized MSR 0x64b by python3
               Please report to x86@kernel.org

The article here seems to suggest these logs are just for information purposes and there shouldn't be any impact on userspace programs. Nevertheless, I wanted to open this issue for tracking this and maybe putting something in the README.

erpalma commented 3 years ago

I guess that in the near future the kernel will disable MSR writes from user space definitely :(

cedws commented 3 years ago

That seems to be the end goal yeah. There'll probably be a kernel parameter to disable the lockdown measures... but then there's the question of whether you really want to do that. I'm sure somebody else will raise this with them.

eXt73 commented 3 years ago

You have to add this to grub:

GRUB_CMDLINE_LINUX="msr.allow_writes=on"

and update grub ;)

gladiac commented 3 years ago

Adding msr.allow_writes=on to the kernel cmdline definitely works for now (tested on 5.9.0). However, writing to MSRs will probably become impossible in the future.

Backporting the MSR-driver to have the current behaviour on newer kernels OR a reimplementation of throttled as a kernel-module might be the only way in the midterm.

Thoughts?

eXt73 commented 3 years ago

In such a situation, I will modify the kernel code so that it will still be possible with the use of this 'flag': msr.allow_writes=on and of course I will make available such built and optimized kernels, as all my builds under ours Netext'73 > https://www.netext73.pl/

... or I will extend my systemd service and bypass these limitations ;)

Lenovo 720s-14IKB

https://www.dropbox.com/s/6pow72x9xf19fi1/Screenshot_20201013_172706.png?dl=0

https://www.dropbox.com/s/7wd1e0fnq04f0ff/Screenshot_20201013_174038.png?dl=0

bp3tk0v commented 3 years ago

How about, instead of bypassing something which we added there for a good reason, you guys work with us?

For example, the 0x1a2 MSR is accessible to userspace through the drivers/thermal/intel/int340x_thermal/processor_thermal_device.c driver. On machines which have that hw, there should be a sysfs file called "tcc_offset_degree_celsius" which gives you the TCC activation offset. We're open to suggestions how to extend that interface so that your tool can read it from sysfs instead of poking at MSRs.

The other MSR above is MSR_CONFIG_TDP_CONTROL and the kernel uses it in a bunch of places. It looks like throttled wants to set cTDP so exposing that functionality in sysfs shouldn't be a big deal AFAICT.

So let's do this right please and stop poking at the naked MSRs because it is a very bad idea.

Thx.

erpalma commented 3 years ago

How about, instead of bypassing something which we added there for a good reason, you guys work with us?

This tool started just as a simple way "to fix my own pc". I agree with you that the right decision would be to use specific sysfs instead of raw MSRs. I would be very glad to upgrade this tool if you are going to help us by submitting the required patches for the kernel.

bp3tk0v commented 3 years ago

This tool started just as a simple way "to fix my own pc". I agree with you that the right decision would be to use specific sysfs instead of raw MSRs. I would be very glad to upgrade this tool if you are going to help us by submitting the required patches for the kernel.

Cool, I'd be glad to.

So how about you send a mail to x86-at-kernel.org (replace the "-at-" with you know what :)) with what exactly you'd like to read out/program from/to which MSRs and I'll CC the relevant people and we'll start the ball rolling. From initial staring, some of the info you need we export already - it'll just need to be extended/designed properly so yours and other tools can use it too.

Thx.

LinuxOnTheDesktop commented 3 years ago

The workaround is not working for me.

I added msr.allow_writes=on to my boot string and rebuilt Grub (I did both those things using the Grub Customizer program) I rebooted. My log is still being flooded with the message at issue.

Mint 20 x64 Cinnamon, kernel 5.9.1-050901-generic. My full kernel/boot string: acpi=force cpuidle.governor=teo i915.enable_fbc=1 i915.fastboot=1 pcie_aspm=force mitigations=off psmouse.synaptics_intertouch=1 quiet reboot=w splash msr.allow_writes=on. Computer: X1CG6.

I will try with the 5.9.0 kernel. EDIT: on the 5.9.0 kernel, as against 5.9.1, the workaround does stop the log flood.

erpalma commented 3 years ago

@bp3tk0v I guess something is already moving at the kernel ML!

eXt73 commented 3 years ago

The workaround is not working for me. I will try with the 5.9.0 kernel. EDIT: on the 5.9.0 kernel, as against 5.9.1, the workaround does stop the log flood.

This thing must be messed up in the kernel you are using - see my sceen - everything flashes under my builds ... I am even thinking about modifying the kernel code, but for now it is enough to add a reference to the grub ... under my build 5.9.1:

https://www.dropbox.com/s/xzkxf9qtuyfqokc/Screenshot_20201022_093136.png?dl=0

bp3tk0v commented 3 years ago

@bp3tk0v I guess something is already moving at the kernel ML!

Yeah, that's me poking at people to get this thing moving. Thus it will be important if you give your requirements about what you want to access through the MSRs so that you can use that interface in your tool too.

Thx.

grealish commented 3 years ago

This issue also effects performance of USB devices connected downstream on a USB 2.0 and 3.0 hub

bp3tk0v commented 3 years ago

This issue also effects performance of USB devices connected downstream on a USB 2.0 and 3.0 hub

How so? I'm very sceptical it does anything but please elaborate.

dzintars commented 3 years ago

Not really related to this repo, but will leave it there:

[    5.832950] msr: Write to unrecognized MSR 0x17f by mcelog
               Please report to x86@kernel.org
bp3tk0v commented 3 years ago

[ 5.832950] msr: Write to unrecognized MSR 0x17f by mcelog

We have a fix queued:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=ras/core&id=68299a42f84288537ee3420c431ac0115ccb90b1

and mcelog will stop poking at that MSR after that.

HTH.

gmasgras commented 3 years ago

The workaround is not working for me.

I added msr.allow_writes=on to my boot string and rebuilt Grub (I did both those things using the Grub Customizer program) I rebooted. My log is still being flooded with the message at issue.

Mint 20 x64 Cinnamon, kernel 5.9.1-050901-generic. My full kernel/boot string: acpi=force cpuidle.governor=teo i915.enable_fbc=1 i915.fastboot=1 pcie_aspm=force mitigations=off psmouse.synaptics_intertouch=1 quiet reboot=w splash msr.allow_writes=on. Computer: X1CG6.

I will try with the 5.9.0 kernel. EDIT: on the 5.9.0 kernel, as against 5.9.1, the workaround does stop the log flood.

This works for me on Arch, X1CG7, systemd-boot

Also (sorry for going off-topic here) TIL about cpuidle.governor=teo which solves a high CPU load issue that I've had for a long time with UAC-2 devices on my X1CG7 :bow: https://bbs.archlinux.org/viewtopic.php?pid=1924581

LinuxOnTheDesktop commented 3 years ago

@eXt73: thank you for your post above. Using the boot switch workaround, and on kernel 5.9.16, I can confirm that the throttle software works - well, unless the following new error (new to that kernel) is relevant.

alsactl[887]: alsa-lib main.c:1021:(snd_use_case_mgr_open) error: failed to import hw:0 (empty configuration)

Linux Mint 20 x64 Cinnamon

erpalma commented 3 years ago

@LinuxOnTheDesktop that's related to ALSA, which is a sound card framework.

LinuxOnTheDesktop commented 3 years ago

Hello (and sorry to moan)

On kernel 5.10.25-051025-generic I still see many instances of msr: Write to unrecognized MSR 0x1a2 by python3 despite having booted with lsm=capability,yama (my /sys/kernel/security/lsm comprising: 'lockdown,capability,yama').

My OS: Mint Cinnamon 20.1. My version of throttled: the latest, got from git just now.

angelsl commented 3 years ago

@bp3tk0v Do you know if there is any progress on exposing these knobs through sysfs? The LKML thread ended in October. Seems like energy_perf_bias now exists and the in-tree utilities have migrated to that, but that's all I see now.

bp3tk0v commented 3 years ago

Well, there were some good ideas at the end of that thread:

https://lore.kernel.org/lkml/20200907094843.1949-1-Jason@zx2c4.com/T/#mc47d8b97df049bc62001ceeeb315c1bdb6f35ff6

but someone needs to actually try them. :-\ For example, I'd take an undervolting driver into the kernel any day of the week if it is done somewhat sane. And it doesn't have to be perfect - we can always improve it incrementally like we always do.

bp3tk0v commented 3 years ago

On kernel 5.10.25-051025-generic I still see many instances of msr: Write to unrecognized MSR 0x1a2 by python3

Does this explain it: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about ?

LinuxOnTheDesktop commented 3 years ago

@bp3tk0v: that page does explain things. For, that page tells me - as someone might have told me already - that our erpalma is, er, working to fix this problem (and it is a problem, for logs needs to be useful and disks should not be written to if one can avoid it).

bp3tk0v commented 3 years ago

Oh they're useful.

neil1969 commented 2 years ago

I was just wondering if there has been any movement on this?

LinuxOnTheDesktop commented 2 years ago

I cannot cope with this log spamming; I have disabled the throttled service (sudo systemctl disable --now lenovo_fix.service).

EDIT: there is hope - see my post below.

LinuxOnTheDesktop commented 2 years ago

I find that, on Mint 20, kernel 5.11, and the latest git version of the throttle-fix, the log spamming is stopped by this boot switch: msr.allow_writes=on.

erpalma commented 2 years ago

That's quite strange since I already write that param at runtime...

LinuxOnTheDesktop commented 2 years ago

@erpalma

Thanks. Do you mean that Throttled adjusts one's kernel boot switches? Or do you mean something else?

erpalma commented 2 years ago

I do not touch the boot switches, I change the MSR module parameters at runtime. The effect is the same.

with open('/sys/module/msr/parameters/allow_writes', 'w') as f:
    f.write('on')
mkostrikin commented 1 year ago
echo on | sudo tee /sys/module/msr/parameters/allow_writes