erpalma / throttled

Workaround for Intel throttling issues in Linux.
MIT License
2.65k stars 160 forks source link

Current limit throttling on i5 8250U #169

Open Helium314 opened 4 years ago

Helium314 commented 4 years ago

I have a Matebook X pro with i5 8250U, and under load lenovo-fix.py --monitor almost always shows current limit throttling. The only exceptions I found so far were stressing only a single core, or setting the power limit so low that the CPU is throttled by power instead of current limit. Both are not really useful...

In Windows (with help of throttlestop) I can reach full speed (3.4 GHz) on all cores without any throttling, package power can go up to 40 W. Core, cache and graphics current limits are set to default. So the hardware is definitely capable of going without throttling

In Linux (Manjaro, Mint and Kubuntu) there always is this current limit throttling. There is definitely no power or thermal throttling. While throttled the package power is between ~10 and 22 W, and temperatures are between 50 and 70 C. Setting the HWP hint to performance instead of balance_performance makes throttling (subjectively) worse, increasing ICCMAX from default values does not help.

When poking around in the registers using msr-tools I noticed that "Electrical Design Point Status" is set to 1 while throttling occurs. This is in registers 6B0 (MSR_GRAPHICS_PERF_LIMIT_REASONS) and 6B1 (MSR_RING_PERF_LIMIT_REASONS), but not in 690 (MSR_CORE_PERF_LIMIT_REASONS). Can I change this electrical design point limit somewhere? Or might this not be the actual cause for throttling?

Interestingly in one try I was able to reach the full 3.4 GHz on all cores without throttling in Kubuntu 18.04 (live), but since subsequent tries showed the "usual" current limit, it is probably really useful information...

nariox commented 4 years ago

Have you updated the BIOS since trying with Kubuntu? Sometimes manufacturers implement limits after releasing the product to avoid damage (due to insufficient design).

But assuming this is not the case, shat settings do you have for ICCMAX?

Helium314 commented 4 years ago

I did not update the BIOS since then, and using Windows 10 and throttlestop I can still get the package power up to 40 W.

ICCMAX is the default (64 A), setting all 3 available current limits to higher values does not help.

I noticed that if there is higher graphics power consumption reported using the --monitor option, trhottling starts at lower package power (using Intel GPU; nvidia available but switched off).

So e.g. if I run stress-ng, package power is at 22 W and I am limited by current. Now some transparent popup notification comes up and graphics power goes from 0.0 to around 2 W and package power decreases to 15 W (and CPU frequencies decrease). When the popup is gone, package power goes back to 22 W. This is not reproducible using Windows 10, so it should be possible to somehow remove the limit.

nariox commented 4 years ago

It is possible there is a hard limit being imposed by the embedded controller. The "Electrical Design Point Status" is probably related to the design of the VRMs and other motherboard constraints, right?

Do you have the DPTF drivers installed on Windows?

Helium314 commented 4 years ago

Yes, as far as I understood you're right about the Electrical Design Point. But it is probably not really a hard limit, as that would also limit in Windows.

I have DPTF drivers installed on Windows. When I install DPTF on Linux (using AUR) and enable the service, I see no change.

nariox commented 4 years ago

At least in Thinkpads, it seems the EC is reading from a "lap detector" and since Linux doesn't implement it, the EC limits the temperature to 75C or 80C. It is possible that something similar is happening (EC can't see VRM mosfet sensors, limits current forcefully, or something). A recent merge on master added the option to disable the BD_PROCHOT flag, this might help.

I actually don't know exactly how well does the DPTF work on Linux. Apparently you'd need to extract the tables using dptfxtract first, but I used it without and it did set my temp limit to 97C.

Another think you could try is set your power UUID. I have modified a script that has been floating around here: set-power-uuid.sh

If none of these work, your best bet would be Huawei's support forums, but I doubt they'll provide any support for Linux (even Lenovo has been very slow/reluctant to solve the problem in Thinkpads). Sorry, I wish I could help more.

Helium314 commented 4 years ago

The script reports success, but does not change anything regarding the throttling Disabling BD_PROICHOT also does not help.

After installing thermald and dptfextract I have 2 files in /etc/thermald thermal-conf.xml.auto

<?xml version="1.0"?> 
 <!-- BEGIN --> 
 <ThermalConfiguration> 
 <Platform>
    <Name> Auto generated </Name>
    <ProductName>MACH-WX9</ProductName>
    <Preference>QUIET</Preference>
    <ThermalZones>
    </ThermalZones>
</Platform>
</ThermalConfiguration>
<!-- END -->

thermal-cpu-cdev-order.xml

<!--

Specifies the order of compensation to cool CPU only.
There is a default already implemented in the code, but
this file can be used to change order

The Following cooling device can present
-->
<CoolingDeviceOrder>
<!--  Specify Cooling device order  -->
<CoolingDevice>rapl_controller</CoolingDevice>
<CoolingDevice>intel_pstate</CoolingDevice>
<CoolingDevice>intel_powerclamp</CoolingDevice>
<CoolingDevice>cpufreq</CoolingDevice>
<CoolingDevice>Processor</CoolingDevice>
</CoolingDeviceOrder>

The QUIET preference from the 1st file could be the problem... I have no idea about thermald, can I just change this to something like POWER or PERFORMANCE and then (re)start the thermald service?

nariox commented 4 years ago

I think the right one is "PERFORMANCE", but thermald seems to interfere with throttled. No clue what would be the best way to address this. ):

Helium314 commented 4 years ago

I tried changing it to PERFORMANCE and also renaming the file from thermal-conf.xml.auto to thermal-conf.xml (restarting thermald after every change), but it did not help...

Helium314 commented 4 years ago

Small update: with a recent BIOS update the package power is limited to ca 25 W instead of 22. So some small improvement, but still not enough to allow the full 3.4 GHz (and still much reduced package power when the GPU is active, e.g. for transparency effects). Still there actual limit is current and So it looks like BIOS has some influence, maybe this can be used...

I tried adding add acpi_osi=\"Windows 2015\" acpi_osi=! to GRUB_CMDLINE_LINUX_DEFAULT (from http://arter97.blogspot.com/2018/08/saving-power-consumption-on-laptops.html ) to make the BIOS think I'm using Windows, but unfortunately there is no improvement.