erpalma / throttled

Workaround for Intel throttling issues in Linux.
MIT License
2.68k stars 166 forks source link

Why not adding support for Alder-Lake #282

Open CicadaSeventeen opened 2 years ago

CicadaSeventeen commented 2 years ago

I guess there is no problem to use this program under 12th Core CPU, maybe it is not a good idea to block supporting ad hoc

erpalma commented 2 years ago

You can easily test throttled on Alder Lake if you want.

mpawlowski-eyeo commented 2 years ago

@erpalma I tested a Lenovo X1 Carbon 10th gen with an I7-1270P and throttled doesn't seem to work on it - the CPU pegs out at around 75 deg C, same as without throttled.

I'm running Kubuntu 22.04, disabled thermald and followed the instructions.

Anything I can try to get it running near its peak?

mpawlowski-eyeo commented 2 years ago

I created a ticket for my specific CPU model: https://github.com/erpalma/throttled/issues/306

lakotamm commented 2 years ago

i7-12700H is also reported as unsupported.

CicadaSeventeen commented 2 years ago

Support or not is kind of ... complex? I also have 10750H device that not allow to use throttled. It depends on BIOS and OEM. So maybe we need more reported examples.

lakotamm commented 2 years ago

I created a pull request + a fork where I bypassed the check so that we can run some tests. If you wanna run it, simply compile it using the original manual.

From what I see decreasing TDP works, increasing does not.

lakotamm commented 2 years ago

My observation: PL-2 seems to work well. PL1 is limited to 20W both on AC an BAT. I have no clue why this is the case, especially since it does not occur in windows.

Laptop: Dell Inspiron 16 7620 OS: Fedora 36 i7-12700H Secure boot - off

[D] core 19 thermal status: thermal limit status = 0 [D] core 19 thermal status: thermal limit log = 0 [D] core 19 thermal status: prochot or forcepr status = 0 [D] core 19 thermal status: prochot or forcepr log = 0 [D] core 19 thermal status: crit temp status = 0 [D] core 19 thermal status: crit temp log = 0 [D] core 19 thermal status: thermal threshold1 status = 0 [D] core 19 thermal status: thermal threshold1 log = 0 [D] core 19 thermal status: thermal threshold2 status = 0 [D] core 19 thermal status: thermal threshold2 log = 0 [D] core 19 thermal status: power limit status = 0 [D] core 19 thermal status: power limit log = 0 [D] core 19 thermal status: current limit status = 0 [D] core 19 thermal status: current limit log = 0 [D] core 19 thermal status: cross domain limit status = 0 [D] core 19 thermal status: cross domain limit log = 0 [D] core 19 thermal status: cpu temp = 60 [D] core 19 thermal status: temp resolution = 1 [D] core 19 thermal status: reading valid = 1 [D] TEMPERATURE_TARGET - write 0x5 - read 0x5 - match OK [D] CONFIG_TDP_CONTROL - write 0x0 - read 0x0 - match OK [D] MSR PACKAGE_POWER_LIMIT - write 0x42816000dd8160 - read 0x42816000dd8160 - match OK [D] MCHBAR PACKAGE_POWER_LIMIT - write 0x42816000dd8160 - read 0xffffffffffffffff - match ERR

mpawlowski-eyeo commented 2 years ago

I created a pull request + a fork where I bypassed the check so that we can run some tests. If you wanna run it, simply compile it using the original manual.

I built your change and my results:

During multi-core heavy load (compiling Chromium) the CPUs spool up to ~2500 MHz and temps go up to 95 deg. C for a few moments, then the CPUs get throttled down to 1000 MHz and temps fall to 60 deg (and stay there).

Feels like I'm reaching some tripping point (sustained watts?).

Without throttled, the temps hover around 80 deg C, performance cores settle at 2200 MHz and efficiency cores at 700 MHz.

i7-1270p is that weird architecture where you have two different kinds of cores - is this handled by throttled?

mkogan1 commented 2 years ago

@mpawlowski-eyeo check if the tcc offset is increasing (in example below it's 1 deg C but on my laptop jumps under load to 20 deg C or more)

sudo turbostat -n 1 -i 1 |& grep offset
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x01640000 (99 C) (100 default - 1 offset)
lakotamm commented 2 years ago

I created a pull request + a fork where I bypassed the check so that we can run some tests. If you wanna run it, simply compile it using the original manual.

I built your change and my results:

During multi-core heavy load (compiling Chromium) the CPUs spool up to ~2500 MHz and temps go up to 95 deg. C for a few moments, then the CPUs get throttled down to 1000 MHz and temps fall to 60 deg (and stay there).

Feels like I'm reaching some tripping point (sustained watts?).

Without throttled, the temps hover around 80 deg C, performance cores settle at 2200 MHz and efficiency cores at 700 MHz.

i7-1270p is that weird architecture where you have two different kinds of cores - is this handled by throttled?

Could you run throttled.py with --debug flag and post the output?

I am wondering whether MCHBAR pkg. power limit failing to synchronize with MSR pkg. power limit is one of the culprits.

lakotamm commented 2 years ago

sudo turbostat -n 1 -i 1 |& grep offset

@mkogan1 This command does not return anything on i7-12700H. Turbostat does not report any offset on my i7-12700H.

cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x08640000 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x883d0000 (39 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x02000003 (100 C, 100 C)
mpawlowski-eyeo commented 2 years ago
$ sudo ./throttled/throttled.py --debug
[sudo] password for mpawlowski: 
[I] Detected CPU architecture: Intel AlderLake
[I] Trying to unlock MSR allow_writes.
[I] Testing if undervolt is supported...
[I] Testing if HWP is supported...
[I] Loading config file.
[D] cpu platform info: maximum non turbo ratio = 25
[D] cpu platform info: maximum efficiency ratio = 4
[D] cpu platform info: minimum operating ratio = 4
[D] cpu platform info: feature ppin cap = 1
[D] cpu platform info: feature programmable turbo ratio = 1
[D] cpu platform info: feature programmable tdp limit = 1
[D] cpu platform info: number of additional tdp profiles = 2
[D] cpu platform info: feature programmable temperature target = 1
[D] cpu platform info: feature low power mode = 1
[D] Undervolt plane CORE - write 0 mV (0x0) - read 0 mV (0x0) - match OK
[D] Undervolt plane GPU - write 0 mV (0x0) - read 0 mV (0x0) - match OK
[D] Undervolt plane CACHE - write 0 mV (0x0) - read 0 mV (0x0) - match OK
[D] Undervolt plane UNCORE - write 0 mV (0x0) - read 0 mV (0x0) - match OK
[D] Undervolt plane ANALOGIO - write 0 mV (0x0) - read 0 mV (0x0) - match OK
[I] Starting main loop.
[D] core 0 thermal status: thermal limit status = 0
[D] core 0 thermal status: thermal limit log = 0
[D] core 0 thermal status: prochot or forcepr status = 0
[D] core 0 thermal status: prochot or forcepr log = 0
[D] core 0 thermal status: crit temp status = 0
[D] core 0 thermal status: crit temp log = 0
[D] core 0 thermal status: thermal threshold1 status = 0
[D] core 0 thermal status: thermal threshold1 log = 0
[D] core 0 thermal status: thermal threshold2 status = 0
[D] core 0 thermal status: thermal threshold2 log = 0
[D] core 0 thermal status: power limit status = 0
[D] core 0 thermal status: power limit log = 1
[D] core 0 thermal status: current limit status = 0
[D] core 0 thermal status: current limit log = 1
[D] core 0 thermal status: cross domain limit status = 0
[D] core 0 thermal status: cross domain limit log = 0
[D] core 0 thermal status: cpu temp = 60
[D] core 0 thermal status: temp resolution = 1
[D] core 0 thermal status: reading valid = 1
[D] core 1 thermal status: thermal limit status = 0
[D] core 1 thermal status: thermal limit log = 0
[D] core 1 thermal status: prochot or forcepr status = 0
[D] core 1 thermal status: prochot or forcepr log = 0
[D] core 1 thermal status: crit temp status = 0
[D] core 1 thermal status: crit temp log = 0
[D] core 1 thermal status: thermal threshold1 status = 0
[D] core 1 thermal status: thermal threshold1 log = 0
[D] core 1 thermal status: thermal threshold2 status = 0
[D] core 1 thermal status: thermal threshold2 log = 0
[D] core 1 thermal status: power limit status = 0
[D] core 1 thermal status: power limit log = 1
[D] core 1 thermal status: current limit status = 0
[D] core 1 thermal status: current limit log = 1
[D] core 1 thermal status: cross domain limit status = 0
[D] core 1 thermal status: cross domain limit log = 0
[D] core 1 thermal status: cpu temp = 60
[D] core 1 thermal status: temp resolution = 1
[D] core 1 thermal status: reading valid = 1
[D] core 2 thermal status: thermal limit status = 0
[D] core 2 thermal status: thermal limit log = 0
[D] core 2 thermal status: prochot or forcepr status = 0
[D] core 2 thermal status: prochot or forcepr log = 0
[D] core 2 thermal status: crit temp status = 0
[D] core 2 thermal status: crit temp log = 0
[D] core 2 thermal status: thermal threshold1 status = 0
[D] core 2 thermal status: thermal threshold1 log = 1
[D] core 2 thermal status: thermal threshold2 status = 0
[D] core 2 thermal status: thermal threshold2 log = 1
[D] core 2 thermal status: power limit status = 0
[D] core 2 thermal status: power limit log = 1
[D] core 2 thermal status: current limit status = 0
[D] core 2 thermal status: current limit log = 1
[D] core 2 thermal status: cross domain limit status = 0
[D] core 2 thermal status: cross domain limit log = 0
[D] core 2 thermal status: cpu temp = 61
[D] core 2 thermal status: temp resolution = 1
[D] core 2 thermal status: reading valid = 1
[D] core 3 thermal status: thermal limit status = 0
[D] core 3 thermal status: thermal limit log = 0
[D] core 3 thermal status: prochot or forcepr status = 0
[D] core 3 thermal status: prochot or forcepr log = 0
[D] core 3 thermal status: crit temp status = 0
[D] core 3 thermal status: crit temp log = 0
[D] core 3 thermal status: thermal threshold1 status = 0
[D] core 3 thermal status: thermal threshold1 log = 1
[D] core 3 thermal status: thermal threshold2 status = 0
[D] core 3 thermal status: thermal threshold2 log = 1
[D] core 3 thermal status: power limit status = 0
[D] core 3 thermal status: power limit log = 1
[D] core 3 thermal status: current limit status = 0
[D] core 3 thermal status: current limit log = 1
[D] core 3 thermal status: cross domain limit status = 0
[D] core 3 thermal status: cross domain limit log = 0
[D] core 3 thermal status: cpu temp = 61
[D] core 3 thermal status: temp resolution = 1
[D] core 3 thermal status: reading valid = 1
[D] core 4 thermal status: thermal limit status = 0
[D] core 4 thermal status: thermal limit log = 0
[D] core 4 thermal status: prochot or forcepr status = 0
[D] core 4 thermal status: prochot or forcepr log = 0
[D] core 4 thermal status: crit temp status = 0
[D] core 4 thermal status: crit temp log = 0
[D] core 4 thermal status: thermal threshold1 status = 0
[D] core 4 thermal status: thermal threshold1 log = 0
[D] core 4 thermal status: thermal threshold2 status = 0
[D] core 4 thermal status: thermal threshold2 log = 0
[D] core 4 thermal status: power limit status = 0
[D] core 4 thermal status: power limit log = 1
[D] core 4 thermal status: current limit status = 0
[D] core 4 thermal status: current limit log = 1
[D] core 4 thermal status: cross domain limit status = 0
[D] core 4 thermal status: cross domain limit log = 0
[D] core 4 thermal status: cpu temp = 62
[D] core 4 thermal status: temp resolution = 1
[D] core 4 thermal status: reading valid = 1
[D] core 5 thermal status: thermal limit status = 0
[D] core 5 thermal status: thermal limit log = 0
[D] core 5 thermal status: prochot or forcepr status = 0
[D] core 5 thermal status: prochot or forcepr log = 0
[D] core 5 thermal status: crit temp status = 0
[D] core 5 thermal status: crit temp log = 0
[D] core 5 thermal status: thermal threshold1 status = 0
[D] core 5 thermal status: thermal threshold1 log = 0
[D] core 5 thermal status: thermal threshold2 status = 0
[D] core 5 thermal status: thermal threshold2 log = 0
[D] core 5 thermal status: power limit status = 0
[D] core 5 thermal status: power limit log = 1
[D] core 5 thermal status: current limit status = 0
[D] core 5 thermal status: current limit log = 1
[D] core 5 thermal status: cross domain limit status = 0
[D] core 5 thermal status: cross domain limit log = 0
[D] core 5 thermal status: cpu temp = 62
[D] core 5 thermal status: temp resolution = 1
[D] core 5 thermal status: reading valid = 1
[D] core 6 thermal status: thermal limit status = 0
[D] core 6 thermal status: thermal limit log = 0
[D] core 6 thermal status: prochot or forcepr status = 0
[D] core 6 thermal status: prochot or forcepr log = 0
[D] core 6 thermal status: crit temp status = 0
[D] core 6 thermal status: crit temp log = 0
[D] core 6 thermal status: thermal threshold1 status = 0
[D] core 6 thermal status: thermal threshold1 log = 0
[D] core 6 thermal status: thermal threshold2 status = 0
[D] core 6 thermal status: thermal threshold2 log = 0
[D] core 6 thermal status: power limit status = 0
[D] core 6 thermal status: power limit log = 1
[D] core 6 thermal status: current limit status = 0
[D] core 6 thermal status: current limit log = 1
[D] core 6 thermal status: cross domain limit status = 0
[D] core 6 thermal status: cross domain limit log = 0
[D] core 6 thermal status: cpu temp = 63
[D] core 6 thermal status: temp resolution = 1
[D] core 6 thermal status: reading valid = 1
[D] core 7 thermal status: thermal limit status = 0
[D] core 7 thermal status: thermal limit log = 0
[D] core 7 thermal status: prochot or forcepr status = 0
[D] core 7 thermal status: prochot or forcepr log = 0
[D] core 7 thermal status: crit temp status = 0
[D] core 7 thermal status: crit temp log = 0
[D] core 7 thermal status: thermal threshold1 status = 0
[D] core 7 thermal status: thermal threshold1 log = 0
[D] core 7 thermal status: thermal threshold2 status = 0
[D] core 7 thermal status: thermal threshold2 log = 0
[D] core 7 thermal status: power limit status = 0
[D] core 7 thermal status: power limit log = 1
[D] core 7 thermal status: current limit status = 0
[D] core 7 thermal status: current limit log = 1
[D] core 7 thermal status: cross domain limit status = 0
[D] core 7 thermal status: cross domain limit log = 0
[D] core 7 thermal status: cpu temp = 63
[D] core 7 thermal status: temp resolution = 1
[D] core 7 thermal status: reading valid = 1
[D] core 8 thermal status: thermal limit status = 0
[D] core 8 thermal status: thermal limit log = 0
[D] core 8 thermal status: prochot or forcepr status = 0
[D] core 8 thermal status: prochot or forcepr log = 0
[D] core 8 thermal status: crit temp status = 0
[D] core 8 thermal status: crit temp log = 0
[D] core 8 thermal status: thermal threshold1 status = 0
[D] core 8 thermal status: thermal threshold1 log = 0
[D] core 8 thermal status: thermal threshold2 status = 0
[D] core 8 thermal status: thermal threshold2 log = 0
[D] core 8 thermal status: power limit status = 0
[D] core 8 thermal status: power limit log = 1
[D] core 8 thermal status: current limit status = 0
[D] core 8 thermal status: current limit log = 1
[D] core 8 thermal status: cross domain limit status = 0
[D] core 8 thermal status: cross domain limit log = 0
[D] core 8 thermal status: cpu temp = 63
[D] core 8 thermal status: temp resolution = 1
[D] core 8 thermal status: reading valid = 1
[D] core 9 thermal status: thermal limit status = 0
[D] core 9 thermal status: thermal limit log = 0
[D] core 9 thermal status: prochot or forcepr status = 0
[D] core 9 thermal status: prochot or forcepr log = 0
[D] core 9 thermal status: crit temp status = 0
[D] core 9 thermal status: crit temp log = 0
[D] core 9 thermal status: thermal threshold1 status = 0
[D] core 9 thermal status: thermal threshold1 log = 0
[D] core 9 thermal status: thermal threshold2 status = 0
[D] core 9 thermal status: thermal threshold2 log = 0
[D] core 9 thermal status: power limit status = 0
[D] core 9 thermal status: power limit log = 1
[D] core 9 thermal status: current limit status = 0
[D] core 9 thermal status: current limit log = 1
[D] core 9 thermal status: cross domain limit status = 0
[D] core 9 thermal status: cross domain limit log = 0
[D] core 9 thermal status: cpu temp = 63
[D] core 9 thermal status: temp resolution = 1
[D] core 9 thermal status: reading valid = 1
[D] core 10 thermal status: thermal limit status = 0
[D] core 10 thermal status: thermal limit log = 0
[D] core 10 thermal status: prochot or forcepr status = 0
[D] core 10 thermal status: prochot or forcepr log = 0
[D] core 10 thermal status: crit temp status = 0
[D] core 10 thermal status: crit temp log = 0
[D] core 10 thermal status: thermal threshold1 status = 0
[D] core 10 thermal status: thermal threshold1 log = 0
[D] core 10 thermal status: thermal threshold2 status = 0
[D] core 10 thermal status: thermal threshold2 log = 0
[D] core 10 thermal status: power limit status = 0
[D] core 10 thermal status: power limit log = 1
[D] core 10 thermal status: current limit status = 0
[D] core 10 thermal status: current limit log = 1
[D] core 10 thermal status: cross domain limit status = 0
[D] core 10 thermal status: cross domain limit log = 0
[D] core 10 thermal status: cpu temp = 63
[D] core 10 thermal status: temp resolution = 1
[D] core 10 thermal status: reading valid = 1
[D] core 11 thermal status: thermal limit status = 0
[D] core 11 thermal status: thermal limit log = 0
[D] core 11 thermal status: prochot or forcepr status = 0
[D] core 11 thermal status: prochot or forcepr log = 0
[D] core 11 thermal status: crit temp status = 0
[D] core 11 thermal status: crit temp log = 0
[D] core 11 thermal status: thermal threshold1 status = 0
[D] core 11 thermal status: thermal threshold1 log = 0
[D] core 11 thermal status: thermal threshold2 status = 0
[D] core 11 thermal status: thermal threshold2 log = 0
[D] core 11 thermal status: power limit status = 0
[D] core 11 thermal status: power limit log = 1
[D] core 11 thermal status: current limit status = 0
[D] core 11 thermal status: current limit log = 1
[D] core 11 thermal status: cross domain limit status = 0
[D] core 11 thermal status: cross domain limit log = 0
[D] core 11 thermal status: cpu temp = 63
[D] core 11 thermal status: temp resolution = 1
[D] core 11 thermal status: reading valid = 1
[D] core 12 thermal status: thermal limit status = 0
[D] core 12 thermal status: thermal limit log = 0
[D] core 12 thermal status: prochot or forcepr status = 0
[D] core 12 thermal status: prochot or forcepr log = 0
[D] core 12 thermal status: crit temp status = 0
[D] core 12 thermal status: crit temp log = 0
[D] core 12 thermal status: thermal threshold1 status = 0
[D] core 12 thermal status: thermal threshold1 log = 0
[D] core 12 thermal status: thermal threshold2 status = 0
[D] core 12 thermal status: thermal threshold2 log = 0
[D] core 12 thermal status: power limit status = 0
[D] core 12 thermal status: power limit log = 1
[D] core 12 thermal status: current limit status = 0
[D] core 12 thermal status: current limit log = 1
[D] core 12 thermal status: cross domain limit status = 0
[D] core 12 thermal status: cross domain limit log = 0
[D] core 12 thermal status: cpu temp = 62
[D] core 12 thermal status: temp resolution = 1
[D] core 12 thermal status: reading valid = 1
[D] core 13 thermal status: thermal limit status = 0
[D] core 13 thermal status: thermal limit log = 0
[D] core 13 thermal status: prochot or forcepr status = 0
[D] core 13 thermal status: prochot or forcepr log = 0
[D] core 13 thermal status: crit temp status = 0
[D] core 13 thermal status: crit temp log = 0
[D] core 13 thermal status: thermal threshold1 status = 0
[D] core 13 thermal status: thermal threshold1 log = 0
[D] core 13 thermal status: thermal threshold2 status = 0
[D] core 13 thermal status: thermal threshold2 log = 0
[D] core 13 thermal status: power limit status = 0
[D] core 13 thermal status: power limit log = 1
[D] core 13 thermal status: current limit status = 0
[D] core 13 thermal status: current limit log = 1
[D] core 13 thermal status: cross domain limit status = 0
[D] core 13 thermal status: cross domain limit log = 0
[D] core 13 thermal status: cpu temp = 62
[D] core 13 thermal status: temp resolution = 1
[D] core 13 thermal status: reading valid = 1
[D] core 14 thermal status: thermal limit status = 0
[D] core 14 thermal status: thermal limit log = 0
[D] core 14 thermal status: prochot or forcepr status = 0
[D] core 14 thermal status: prochot or forcepr log = 0
[D] core 14 thermal status: crit temp status = 0
[D] core 14 thermal status: crit temp log = 0
[D] core 14 thermal status: thermal threshold1 status = 0
[D] core 14 thermal status: thermal threshold1 log = 0
[D] core 14 thermal status: thermal threshold2 status = 0
[D] core 14 thermal status: thermal threshold2 log = 0
[D] core 14 thermal status: power limit status = 0
[D] core 14 thermal status: power limit log = 1
[D] core 14 thermal status: current limit status = 0
[D] core 14 thermal status: current limit log = 1
[D] core 14 thermal status: cross domain limit status = 0
[D] core 14 thermal status: cross domain limit log = 0
[D] core 14 thermal status: cpu temp = 62
[D] core 14 thermal status: temp resolution = 1
[D] core 14 thermal status: reading valid = 1
[D] core 15 thermal status: thermal limit status = 0
[D] core 15 thermal status: thermal limit log = 0
[D] core 15 thermal status: prochot or forcepr status = 0
[D] core 15 thermal status: prochot or forcepr log = 0
[D] core 15 thermal status: crit temp status = 0
[D] core 15 thermal status: crit temp log = 0
[D] core 15 thermal status: thermal threshold1 status = 0
[D] core 15 thermal status: thermal threshold1 log = 0
[D] core 15 thermal status: thermal threshold2 status = 0
[D] core 15 thermal status: thermal threshold2 log = 0
[D] core 15 thermal status: power limit status = 0
[D] core 15 thermal status: power limit log = 1
[D] core 15 thermal status: current limit status = 0
[D] core 15 thermal status: current limit log = 1
[D] core 15 thermal status: cross domain limit status = 0
[D] core 15 thermal status: cross domain limit log = 0
[D] core 15 thermal status: cpu temp = 62
[D] core 15 thermal status: temp resolution = 1
[D] core 15 thermal status: reading valid = 1
[D] TEMPERATURE_TARGET - write 0xf - read 0xf - match OK
[D] CONFIG_TDP_CONTROL - write 0x0 - read 0x0 - match OK
[D] MSR PACKAGE_POWER_LIMIT - write 0x42816000dd80e8 - read 0x42816000dd80e8 - match OK
[D] MCHBAR PACKAGE_POWER_LIMIT - write 0x42816000dd80e8 - read 0xffffffffffffffff - match ERR
mpawlowski-eyeo commented 2 years ago

sudo turbostat -n 1 -i 1 |& grep offset

This command also doesn't return anything for me, even under load and after the cores have been throttled down

cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x0f640000 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88360800 (46 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C)
lakotamm commented 2 years ago

Bingo! I went through the code and I am 95% sure that I found the reason why it does not work.

Both @mpawlowski-eyeo and I got this error: [D] MCHBAR PACKAGE_POWER_LIMIT - write 0x42816000dd80e8 - read 0xffffffffffffffff - match ERR

In the code I found that MCHBAR PACKAGE_POWER_LIMITis being written into address 0xFED159A0. This is a combination of 0xFED10000 (MCHBAR address) + 59A0h offset and it is correct for my 8th gen CPU. However on my 12th gen CPU, using command sudo setpci -s 0:0.0 48.l I can see that MCHBAR address is actually 0xFEDC0001.

So the correct address should actually be 0xFEDC59A0.

On the top of that it could be good to check whether we are writing correct data into the 16th bit of the register, since its function has been changed in 11th and 12th gen compared to 10th gen CPUs 10th gen DATASHEET 11th gen DATASHEET 12th gen DATASHEET

lakotamm commented 2 years ago

And even better news - changing the address actually fixes the error for me! [D] MCHBAR PACKAGE_POWER_LIMIT - write 0x4280f000dd80c8 - read 0x4280f000dd80c8 - match OK

Now I can easily fry the i7-12700H for unlimited time at whatever value I set PL1 to - e.g. 45W: [BATTERY] Thermal: OK - Power: LIM - Current: OK - Cross-domain (e.g. GPU): OK || VCore: 971 mV - Package: 45.0 W - Graphics: 0.0 W - DRAM: 0.0 W - Total: 45.

I created a commit 2629c786a6f4b17a26dfb6d17b8bf3d69e541937 (browse files link) where I temporarily implemented the fix. Since the address is atm just a fixed number, for safety I disabled support for all other than Alder Lake CPUs. @mpawlowski-eyeo could you give it a try?

mpawlowski-eyeo commented 2 years ago

@lakotamm I can confirm this works for me too, now the cores stay at 95 degrees as long as I want them to. Good find!

lakotamm commented 2 years ago

I am glad to hear that it is working! @erpalma would you by any chance have time to implement the fix?

Btw I am wondering whether also the issues related to the 11th gen CPUs are caused by this - #265 #255 since they report the same MCHBAR error.

erpalma commented 2 years ago

I should be able to implement this fix tomorrow/next week!

Also, should we call setpci instead of just "assuming" the MCHBAR base?

lakotamm commented 2 years ago

I should be able to implement this fix tomorrow/next week!

Also, should we call setpci instead of just "assuming" the MCHBAR base?

That sounds great!

I would say that it makes sense to do so and it might save us some work of determining which address goes where. Especially since Intel itself suggests in the datasheet that we determine the address by calling the command. But only if it is not too complicated. Right now the only uncomfirmed CPUs are Ice Lake and TigerLake-H.

erpalma commented 2 years ago

Ok, I've pushed these few lines of code. Feedback are welcome ;)

lakotamm commented 2 years ago

Ok, I've pushed these few lines of code. Feedback are welcome ;)

I think that the code looks neat! I will test it ASAP.

My only suggestion is that I would by default assign 0xFEDC0001 address also to Tiger Lake CPUs.

warning('Trying to guess the MCHBAR address from the CPUID. This MIGHT NOT WORK!')
        if cpuid in ((6, 151, 2), (6, 154, 3)): # I would add also Tiger lake CPUs here
            MCHBAR_BASE = 0xFEDC0001
        else:
            MCHBAR_BASE = 0xFED10001
lakotamm commented 2 years ago

It seems like everything works alright with the latest update!

lakotamm commented 2 years ago

I added support for Alder Lake -U series and 6 core -S in pull request #310.