erpalma / throttled

Workaround for Intel throttling issues in Linux.
MIT License
2.67k stars 167 forks source link

Thinkpad X1 Carbon 6th Gen - still throttling issue #31

Open Unclezz opened 6 years ago

Unclezz commented 6 years ago

Hi, I own a Lenovo Thinkpad X1 witbh Intel i7 8550u. Running Kubuntu with kernel 4.17.2-041702-generic #201806160433. I installed the fixapparently without issues and tried to play a bit the configuration file. However, no matter what i change I always get message about CPU throttling:

> [ 5285.052945] CPU2: Core temperature above threshold, cpu clock throttled (total events = 8)
> [ 5285.052945] CPU6: Core temperature above threshold, cpu clock throttled (total events = 8)
> [ 5285.052947] CPU3: Package temperature above threshold, cpu clock throttled (total events = 72)
> [ 5285.052948] CPU5: Package temperature above threshold, cpu clock throttled (total events = 72)
> [ 5285.052949] CPU4: Package temperature above threshold, cpu clock throttled (total events = 72)
> [ 5285.052950] CPU7: Package temperature above threshold, cpu clock throttled (total events = 72)
> [ 5285.052951] CPU0: Package temperature above threshold, cpu clock throttled (total events = 72)
> [ 5285.052951] CPU1: Package temperature above threshold, cpu clock throttled (total events = 72)
> [ 5285.052953] CPU6: Package temperature above threshold, cpu clock throttled (total events = 72)
> [ 5285.052959] CPU2: Package temperature above threshold, cpu clock throttled (total events = 72)

Moreover when running command rdmsr -f 29:24 -d 0x1a2 I cannot get better values than 10 when plugged in AC and 20 when using battery.

Do you think there could be something wrong with my setup or anything I can do to improve? Thanks

ejgallego commented 6 years ago

Thermald?

Unclezz commented 6 years ago

Already tried to disable it but issue remains.

jdydsco commented 6 years ago

I have a T480 and have been using this fix for a while, but I still get those messages too (Fedora 28). Never had thermald installed

I thought I read somewhere before that these specific messages were false positives though? The temp limits definitely increased when I installed the fix but it presumably still throttles since the chip isn't capable of running at 100% clock speeds on these laptops & is designed to throttle. Just at a better temp now.

DEvil0000 commented 6 years ago

Getting those messages is normal as far as I can tell. As soon as it gets hot and the cooling is not enough it will throttle. The script mainly changes when it gets throttled. However I think I only get those messages when doing a reboot, standby, comming back from standby and so on. rdmsr -f 29:24 -d 0x1a2 of 10 would normally mean the script has set the max temperature to 90°C instead of the usual 80°C. Keep in mind if you change the config you need to restart the script. You can check if its working with s-tui and stress. If you get a average of about 15W or more it is working (Maybe 12W for 1 core doing some work).

erpalma commented 6 years ago

Yup @DEvil0000 is right. Those messages are normal since the CPU is always going to throttle at some point due to thermal limits. We are just raising (a lot) those limits. You can disable those events in bios if you want, but they are not harmful.

DEvil0000 commented 6 years ago

Not every BIOS has a switch for it. Maybe we should add a option for it.

-------- Ursprüngliche Nachricht -------- Von: Francesco Palmarini notifications@github.com Datum: 13.08.2018 21:24 (GMT+01:00) An: erpalma/lenovo-throttling-fix lenovo-throttling-fix@noreply.github.com Cc: "A. Binzxxxxxx" alexander@binzberger.de, Mention mention@noreply.github.com Betreff: Re: [erpalma/lenovo-throttling-fix] Thinkpad X1 Carbon 6th Gen - still throttling issue (#31)

Yup @DEvil0000 is right. Those messages are normal since the CPU is always going to throttle at some point due to thermal limits. We are just raising (a lot) those limits. You can disable those events in bios if you want, but they are not harmful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/erpalma/lenovo-throttling-fix","title":"erpalma/lenovo-throttling-fix","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/erpalma/lenovo-throttling-fix"}},"updates":{"snippets":[{"icon":"PERSON","message":"@erpalma in #31: Yup @DEvil0000 is right. Those messages are normal since the CPU is always going to throttle at some point due to thermal limits. We are just raising (a lot) those limits. You can disable those events in bios if you want, but they are not harmful. "}],"action":{"name":"View Issue","url":"https://github.com/erpalma/lenovo-throttling-fix/issues/31#issuecomment-412634501"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/erpalma/lenovo-throttling-fix/issues/31#issuecomment-412634501", "url": "https://github.com/erpalma/lenovo-throttling-fix/issues/31#issuecomment-412634501", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [erpalma/lenovo-throttling-fix] Thinkpad X1 Carbon 6th Gen - still throttling issue (#31)", "sections": [ { "text": "", "activityTitle": "Francesco Palmarini", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@erpalma", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"erpalma/lenovo-throttling-fix\",\n\"issueId\": 31,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"erpalma/lenovo-throttling-fix\",\n\"issueId\": 31\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/erpalma/lenovo-throttling-fix/issues/31#issuecomment-412634501" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 365571518\n}" } ], "themeColor": "26292E" } ]

erpalma commented 6 years ago

Do you know the register for that?

DEvil0000 commented 6 years ago

sorry I don't know but I remember there was one MSR with high temp event or something like that. It might be the one but I guess its something else. I can imagine it could be some ACPI thing since that was what thermald used to get the events.

erpalma commented 6 years ago

Those are called Machine Check Events, we should have a look to that.

pkieltyka commented 6 years ago

I'll say, the work in this repo is much appreciated. But, it unfortunately makes my system unstable after many different attempts of tweaking values. I really hope Lenovo makes a bios update to fix this properly.

DEvil0000 commented 6 years ago

if it is unstable with the script your undervolting is too much in 99.9% of the cases. thats also stated in the readme.

pkieltyka commented 6 years ago

Ive completely turned off (set 0's) to undervolting. But yes, undervolting was a disaster on my X1C6.

the stability issues are still happening.. perhaps thermal issues, I tried setting a 90 degree max when connected, but, ive found still crashing, especially when connected to an external 4K monitor

DEvil0000 commented 6 years ago

it is very unlikely that you get a unstable system with the script if undervolting is not used and your hardware is fine. there are just a few things i can think of causing this:

after setting a new config you did restart the script right?

pkieltyka commented 6 years ago

@DEvil0000 thanks for the suggestions. I'll definitely try to set limit to 80 degrees and see if it goes above. for reference, my lenovo_fix.conf is: https://gist.github.com/pkieltyka/64653c5b7ca44aaccd0f923a350b801b

and to note, I've tried setting AC's Trip_Temp_C to 90 as well, and same issues. Especially when plugged into an external display that is, that is, the UltraFine 5K display which has usb-c connector for power + display, it generally works, but if i push the machine with too much CPU or GPU, then external monitor will disconnect, and ive even had a strange kernel trap error with acpi

pkieltyka commented 6 years ago

I just tried.. with the lenovo_fix running (start/enable+reboot) with above settings, my external display will crash when I hit 90 degrees with a stress test via s-tui and other operations at the same time. Without the fix running (stop/disable+reboot), although my system clocks up to ~2GHz (bummer), and I push it with the display to 90 degrees, it stays stable and doesn't crash. Perhaps there are some settings where this can in fact with with my display power AC source.. but, I can't seem to make it stable

erpalma commented 6 years ago

Wait, does it happen with the Lenovo power supply too?

pkieltyka commented 6 years ago

thanks for the help guys. I just tried using the Lenovo power supply (65W) and hooked up the 5K external monitor to the other usb-c port, turned on the lenovo_fix config as above, and it seems my system does not crash. I'm surprised it couldn't drive the CPU and monitor under load when it works fine on a 15" MBP with a dedicated GPU as well, but, at least this works. I'll report back if anything changes

pkieltyka commented 6 years ago

I spoke too soon.. system is clocking hard.. display disconnected.

DEvil0000 commented 6 years ago

there might still be throttle to various reasons but it should not crash. look at one thing at a time.you can also try the bigger power suplies of lenovo. for my laptop the small 65W supply does not work well with basically anything connected (starting with the dock).but thats at lenovo (for my laptop) what you get by default. and the supply of the old laptop with the same rating was not powerfull enough anyway but thats maybe expected.

erpalma commented 6 years ago

Can you clarify on what you mean with crash? Hard freeze, system reset, just the external monitor stop working?

DEvil0000 commented 6 years ago

@erpalma not sure if you asked me but here is my answer to it: none of those should happen if no voltage offset or overclocking is involved. while the external monitor thing might have a ton of reasons which might also be not related to the laptop or cpu in any way it should also not happen. however the other cases have been what i was thinking about. starting with computation mistakes till resets and freezes.

erpalma commented 6 years ago

Actually I was asking to @pkieltyka ;)

pkieltyka commented 6 years ago

@erpalma when I said crash, I meant the external monitor disconnects and will not reconnect until a full reboot. But I have had a kernel trap error in the past as well. I'm just mentioning that with/without the lenovo_fix the different behaviour of the system. I do wish I could use the lenovo_fix though as it certainly gives me more juice

erpalma commented 6 years ago

Ok, it might still be a computation error (eg. at the kernel level) causing the monitor to disconnect but I guess this is more a power-related issue. Can you perform some tests under windows too?

pkieltyka commented 6 years ago

good idea to test it under windows, but unfortunately I only have Linux on the machine now

velaar commented 5 years ago

On X1C6 I'm getting a lot of LIM - Cross-comain (e.g. GPU) in many cases it hits much faster than the thermal limit. Any way to increase the limit?

andreev1024 commented 5 years ago

Hi guys. I have Ubuntu 18.04.1 (4.15.0-39-generic) and 1XC6. When script installation has finished (all looks fine, no errors) my system freeze and then, after short time (~10 sec) it's reboot. When starting OS I enter password and then get the same behavior - freeze, reboot. Do you have any ideas?

erpalma commented 5 years ago

@velaar I don't think so, but I'll check.

@andreev1024 Did you enable undervolt?

andreev1024 commented 5 years ago

@erpalma yes, but then I reinstalled script and on first run I got a freeze and reload (with default settings).

erpalma commented 5 years ago

Did you manually revert the config file? Because by default the install script does not overwrite it.

andreev1024 commented 5 years ago

@erpalma you right. There is an old config:

[GENERAL]
# Enable or disable the script execution
Enabled: True
# SYSFS path for checking if the system is running on AC power
Sysfs_Power_Path: /sys/class/power_supply/AC*/online

## Settings to apply while connected to Battery power
[BATTERY]
# Update the registers every this many seconds
Update_Rate_s: 5
# Max package power for time window #1
PL1_Tdp_W: 29
# Time window #1 duration
PL1_Duration_s: 28
# Max package power for time window #2
PL2_Tdp_W: 44
# Time window #2 duration
PL2_Duration_S: 0.002
# Max allowed temperature before throttling
Trip_Temp_C: 80
# Set cTDP to normal=0, down=1 or up=2 (EXPERIMENTAL)
cTDP: 0

## Settings to apply while connected to AC power
[AC]
# Update the registers every this many seconds
Update_Rate_s: 5
# Max package power for time window #1
PL1_Tdp_W: 44
# Time window #1 duration
PL1_Duration_s: 28
# Max package power for time window #2
PL2_Tdp_W: 44
# Time window #2 duration
PL2_Duration_S: 0.002
# Max allowed temperature before throttling
Trip_Temp_C: 90
# Set HWP energy performance hints to 'performance' on high load (EXPERIMENTAL)
HWP_Mode: False
# Set cTDP to normal=0, down=1 or up=2 (EXPERIMENTAL)
cTDP: 0

[UNDERVOLT]
# CPU core voltage offset (mV)
CORE: -110
# Integrated GPU voltage offset (mV)
GPU: -90
# CPU cache voltage offset (mV)
CACHE: -110
# System Agent voltage offset (mV)
UNCORE: -90
# Analog I/O voltage offset (mV)
ANALOGIO: 0
shizonic commented 5 years ago

May anybody provide a working config for X1C 6th (i7-8550U) ?