Closed svmlegacy closed 2 years ago
Clean make of main branch. Inserting corefreqk.ko module results in hard lock of this system, even num lock frozen.
Atom 330 of Diamondville has a CPUID of 06_1C
https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.h#L1304
Was it running with older versions of CoreFreq ?
If not, comment out or remove those lines:
https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.c#L2316
https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.c#L7487
... then rebuild and try.
Have also seen this issue on select Intel ES processors, on unreleased steppings.
ES, which CPUID and Brand strings are they ?
Intel Atom 330, CPUID 106C2h (06_1C stepping 2) is correct.
Was it running with older versions of CoreFreq ? If not, comment out or remove those lines: ... then rebuild and try.
Unfortunately still hard-locking. This is the first chance I've had to run this system. Do you have a suggested older version to try?
ES, which CPUID and Brand strings are they ?
The two that I've tried are as follows:
Unsure if it's related, always chocked it up to them being early ES's. They hardlock in the exact same manner, so added it as a piece of info.
Unfortunately still hard-locking. This is the first chance I've had to run this system. Do you have a suggested older version to try?
Do you have any kernel log or screenshot of the backtracked functions and registers dump ?
ES, which CPUID and Brand strings are they ?
The two that I've tried are as follows:
CPUID signature 06_1A
and 06_1F
are both implemented into CoreFreq , respectively _Nehalem_Bloomfield
and _Nehalem_MB
Probably those zeros in the brand string Genuine Intel(R) CPU @ 0000 @ 1.87GHz
lead the driver to a division error.
For testings, the line bellow can be commented and replaced with a static value: https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.c#L1017
/*
iArg->Features->Factory.Freq = Intel_Brand( iArg->Features->Info.Brand,
iArg->Brand );
*/
iArg->Features->Factory.Freq = 1870;
@svmlegacy : Please let me know about results with suggested code above and Atom 330 crash screen.
@svmlegacy : Please let me know about results with suggested code above and Atom 330 crash screen.
Still trying to get any kind of debugging info out. Hard lock occurs before any outputs. Trying to get debugging out to a secondary PC via the COM port, but so far only getting a garbled mess. Will let you know when I have something useful.
@svmlegacy : Please let me know about results with suggested code above and Atom 330 crash screen.
Still trying to get any kind of debugging info out. Hard lock occurs before any outputs. Trying to get debugging out to a secondary PC via the COM port, but so far only getting a garbled mess. Will let you know when I have something useful.
About the Atom 330, I would suggest to read the MSR registers happening on the call flow.
Architecture entries are in these lines: https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.h#L6507
Load the Kernel msr driver and read registers using its CLI Any access violation should be trapped by kernel to prevent a crash.
modprobe msr
rdmsr -ax <reg_no>
First entry starts in Query_Core2()
which leads to Intel_Core_Platform_Info()
where are read :
Thus do:
rdmsr -ax 0x000000ce
rdmsr -ax 0x00000198
rdmsr -ax 0x00000017
Next Query_Core2()
goes into HyperThreading_Technology()
for Topology where is read MSR_IA32_APICBASE
Do:
rdmsr -ax 0x0000001b
At this point we're done with Query_Core2()
. Let me know if registers can be safely read on your processor.
Somewhat interesting update:
After a full, clean reinstall of Fedora 35 due other unrelated troubles (nvidia 340 drivers breaking the system), When I made corefreq for the shipped kernel 5.14, I got a segmentation fault on inserting corefreqk.ko. Rebooting the system without updating any packages resulted in the hardlock on loading again.
All suggested registers outputted hex code without issue, matching on all cores. I'll submit the actual results of this tommorow.
At this point, I'm debating on switching to another distro, even if Fedora 35 works on other platforms.
At this point, I'm debating on switching to another distro, even if Fedora 35 works on other platforms.
My favorite being ArchLinux, in my Wiki I'm providing CoreFreq live image based on Arch.
New Bottom of the page you'll also find the nightly build with CoreFreq development branch embedded.
Those images also contain the full Arch installation scripts, including Network Manager and its nmtui
for easy Network devices setup.
Just to be sure about Nehalem: here is the latest development using the bootable CoreFreq ISO
Update on the Atom 330: Corefreq Arch Linux build also has a kernel panic when loading the module.
Does this build push any information to ttyS0 by default? Still haven't gotten any meaningful information there from the machine at all, but curious if it's worth a try. Kernel panic didn't seem to have much valuable information, but I'll try to get a picture of it in the faulted state.
Will be trying the Nehalem chips after the Atom is sorted... They take up the same workbench :)
Update on the Atom 330: Corefreq Arch Linux build also has a kernel panic when loading the module.
Does this build push any information to ttyS0 by default? Still haven't gotten any meaningful information there from the machine at all, but curious if it's worth a try. Kernel panic didn't seem to have much valuable information, but I'll try to get a picture of it in the faulted state.
Will be trying the Nehalem chips after the Atom is sorted... They take up the same workbench :)
Can you post here the output of command lspci -nn
of your Atom 330 and the ES processors ?
Because I would like to check their device DID and the driver callflow consequently. Perhaps some DID are present but the Base Address and CSR registers are not. For exemple, Atom 330 has not VT-d support.
Atom 330 lspci: here
OH! NVidia MCP79 is not implemented yet.
Manufacturer DID 10de
is not part of driver yet . It may start with argument:
insmod corefreqk.ko ArchID=<N>
where <N>
taken from the generic architectures 0
or 11
https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.h#L6096
We will have to program a new loop from scratch. This time, I'll recommend to use the most transparent VM to test and enhance CoreFreq until we feel confident to run bare-metal.
As usual, the key for a good implementation is the NVidia MCP79 datasheet and its registers specification. Googling is showing some documents; kernel source code for that chip is to dig also.
06_1C
the Intel SDM specification is bringing us to the following MSR list:Apparently MSR_PLATFORM_ID
is available.
First change is to add _Atom_Bonnell
in the Intel_MaxBusRatio()
function:
https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.c#L2311
int Intel_MaxBusRatio(PLATFORM_ID *PfID)
{
struct SIGNATURE whiteList[] = {
_Core_Conroe, /* 06_0F */
_Core_Penryn, /* 06_17 */
_Atom_Bonnell, /* 06_1C */
_Atom_Silvermont, /* 06_26 */
_Atom_Lincroft, /* 06_27 */
_Atom_Clover_Trail, /* 06_35 */
_Atom_Saltwell, /* 06_36 */
_Silvermont_Bay_Trail, /* 06_37 */
_Atom_Bonnell, /* 06_1C */
};
int id, ids = sizeof(whiteList) / sizeof(whiteList[0]);
for (id = 0; id < ids; id++) {
if ((whiteList[id].ExtFamily \
== PUBLIC(RO(Proc))->Features.Std.EAX.ExtFamily)
&& (whiteList[id].Family \
== PUBLIC(RO(Proc))->Features.Std.EAX.Family)
&& (whiteList[id].ExtModel \
== PUBLIC(RO(Proc))->Features.Std.EAX.ExtModel)
&& (whiteList[id].Model \
== PUBLIC(RO(Proc))->Features.Std.EAX.Model))
{
RDMSR((*PfID), MSR_IA32_PLATFORM_ID);
return 0;
}
}
return -1;
}
Then rebuild, unload, restart all (bare-metal test)
Another request is to check if MSR_PLATFORM_INFO
is effectively not supported by Bonnel because it is not listed among architectural list:
whereas we have a go for MSR_IA32_PERF_STATUS
rdmsr -ax 0x000000ce
... check the kernel log for a trapped execution ? A zero returned value is also a sign of unsupported register.
If unsupported please comment out its usage in function Intel_Core_Platform_Info()
:
https://github.com/cyring/CoreFreq/blob/478eee81930e1c339f13787a17d7d0ffe2231e2d/corefreqk.c#L2341
change function like bellow:
void Intel_Core_Platform_Info(unsigned int cpu)
{
PLATFORM_ID PfID = {.value = 0};
PLATFORM_INFO PfInfo = {.value = 0};
PERF_STATUS PerfStatus = {.value = 0};
unsigned int ratio0 = 10, ratio1 = 10; /*Arbitrary values*/
/*
RDMSR(PfInfo, MSR_PLATFORM_INFO);
if (PfInfo.value != 0) {
ratio0 = PfInfo.MaxNonTurboRatio;
}
*/
RDMSR(PerfStatus, MSR_IA32_PERF_STATUS);
if (PerfStatus.value != 0) { /* §18.18.3.4 */
if (PerfStatus.CORE.XE_Enable) {
ratio1 = PerfStatus.CORE.MaxBusRatio;
} else {
if (Intel_MaxBusRatio(&PfID) == 0) {
if (PfID.value != 0)
{
ratio1 = PfID.MaxBusRatio;
}
}
}
} else {
if (Intel_MaxBusRatio(&PfID) == 0) {
if (PfID.value != 0)
{
ratio1 = PfID.MaxBusRatio;
}
}
}
PUBLIC(RO(Core, AT(cpu)))->Boost[BOOST(MIN)] = KMIN(ratio0, ratio1);
PUBLIC(RO(Core, AT(cpu)))->Boost[BOOST(MAX)] = KMAX(ratio0, ratio1);
}
@svmlegacy Hey! any progress with the debugging code requests above ?
@svmlegacy : please let me know when you can contribute on issue.
@svmlegacy Since commit b2f75c89332a1e0ffa517c22895c57c1b91ac812 what about Atom 330 ?
Sorry about the inactivity lately, I'll give it a shot tommorow and see what happens! Thanks for the poke.
All my previous attempts were fruitless, just tried again with the dev version of the archlinux ISO and the current master branch. No luck. Haven't been able to get a serial connection outbound either.
All my previous attempts were fruitless, just tried again with the dev version of the archlinux ISO and the current master branch. No luck. Haven't been able to get a serial connection outbound either.
Thanks for trying the develop
branch.
Don't you have any kernel log (dmesg
) to see where the Atom has crashed in the driver callflow ?
Don't you have any kernel log (dmesg) to see where the Atom has crashed in the driver callflow ?
Great point! There is something that changed since last time I was working with this. Before, the system would hard lock, meaning I couldn't pull from dmesg. Now, it seems like it's not causing the system to lock (but still isn't working quite right.)
Here's the dmesg pulled from the system, the the attempted module insertion as the last entries: dmesg.txt .
Don't you have any kernel log (dmesg) to see where the Atom has crashed in the driver callflow ?
Great point! There is something that changed since last time I was working with this. Before, the system would hard lock, meaning I couldn't pull from dmesg. Now, it seems like it's not causing the system to lock (but still isn't working quite right.)
Here's the dmesg pulled from the system, the the attempted module insertion as the last entries: dmesg.txt .
Yes, it started at:
CoreFreq(0:2:-1): Processor [ 06_1C] Architecture [Atom/Bonnell] SMT [4/4]
Can you read this register ?
## MSR_TEMPERATURE_TARGET
rdmsr -ax 0x1A2
if not, please comment that line in the driver code, next rebuild/reload all for testing https://github.com/cyring/CoreFreq/blob/a1540153123db1b2614dcc2d8cddede1be3a42cb/corefreqk.c#L7737
Can you read this register ?
## MSR_TEMPERATURE_TARGET rdmsr -ax 0x1A2
Nope. Could not read that MSR.
Commenting out this line enables the system to insert the mod with no issues. https://github.com/cyring/CoreFreq/blob/a1540153123db1b2614dcc2d8cddede1be3a42cb/corefreqk.c#L7737
Dumped a bunch of info here: https://gist.github.com/svmlegacy/9bd33c5b273e4310f20a3c6c2b288bfe
Wonderful to see progress!
Great to see that screenshot of Bonnell
The last register MSR_TEMPERATURE_TARGET
really hurts processor.
And we are left without a TjMax
which is hard-coded to 100°C
We can fine tune TjMax
and also the Temperature formula, if you aware of better values for your Processor ?
I'm wrapping up all the code change: other Atom architectures are also impacted by same issue.
@svmlegacy Code changes made so far are available in commit 0794238d5e9bdeae6252dff46f8dd001f5c12294
The monitoring loop for Bonnell
is very basic and now need to be affine with architectural MSR registers listed in the SDM specifications at chapter 2.3
And this datasheet also -;)
For information, Low Power Features P_LVLx I/O
So far we don't have the I/O Base Address register:
|- Core C-States
|- C-States Base Address BAR [ 0x0 ]
For information, a TjMax
of 85.2°C
according to the bellow table.
EDIT: If temperature is not accurate, you can try the integer value of 85
at this code line:
https://github.com/cyring/CoreFreq/blob/0794238d5e9bdeae6252dff46f8dd001f5c12294/corefreqk.c#L8235
VCC
:
https://github.com/cyring/CoreFreq/blob/0794238d5e9bdeae6252dff46f8dd001f5c12294/corefreqk.h#L7115
with:
.voltageFormula = VOLTAGE_FORMULA_INTEL_SOC,
or:
.voltageFormula = VOLTAGE_FORMULA_INTEL_SNB,
Voltage scope
to < SMT>
in Settings
menuVoltage
Good News! The develop branch now works as-is for the Atom 330.
Reported temperature looks good. Offsetting by another 15°C would put it sub-ambient. Tjmax of 85°C matches what is reported by other utilities.
I tried changing the .voltageFormula with the suggested statements:
https://github.com/cyring/CoreFreq/blob/ed94b48f4adaad30f8c4df7f7f83734f60f1cf03/corefreqk.h#L7172
Neither produced a good result in the SMT scope. _SOC was locked at 0.38V, and _SNB was at 0.0033 V. Expected VID range per the datasheet is 0.7 - 1.2 V.
FYI I have a couple other Bonnell chips that we can use for testing. Intel Atom N270 (32-bit only, Diamondville) Intel Atom N450 (64-bit capable, Pineview)
Good News! The develop branch now works as-is for the Atom 330.
Reported temperature looks good. Offsetting by another 15°C would put it sub-ambient. Tjmax of 85°C matches what is reported by other utilities.
I tried changing the .voltageFormula with the suggested statements:
https://github.com/cyring/CoreFreq/blob/ed94b48f4adaad30f8c4df7f7f83734f60f1cf03/corefreqk.h#L7172
Neither produced a good result in the SMT scope. _SOC was locked at 0.38V, and _SNB was at 0.0033 V. Expected VID range per the datasheet is 0.7 - 1.2 V.
Let's keep this voltage algorithm VOLTAGE_FORMULA_INTEL_SOC
but we will adjust the formula here:
https://github.com/cyring/CoreFreq/blob/ed94b48f4adaad30f8c4df7f7f83734f60f1cf03/coretypes.h#L614
What we are interested in is this equation:
https://github.com/cyring/CoreFreq/blob/ed94b48f4adaad30f8c4df7f7f83734f60f1cf03/coretypes.h#L629
which receives a voltage VID
as an input, and outputs the Vcore
In datasheets, most of the time volume 1
, we should find the associations table between both. But also some steps and other offsets to apply to the Vcore
formula.
Tbc.
FYI I have a couple other Bonnell chips that we can use for testing. Intel Atom N270 (32-bit only, Diamondville) Intel Atom N450 (64-bit capable, Pineview)
32-bits is not supported but I will enjoy the N450.
In datasheet, table 3-2
VID
converted in DecimalVcore = 0.7 + ( 73.0 - (double) (VID) ) * 0.0125;
VID | Formula | Vcore |
---|---|---|
1 0 0 1 0 0 1 (73 ) |
0.7 + (73.0 - 73.0) * 0.0125 |
0.7000 |
1 0 0 1 0 0 0 (72 ) |
0.7 + (73.0 - 72.0) * 0.0125 |
0.7125 |
0 1 1 0 1 1 0 (54 ) |
0.7 + (73.0 - 54.0) * 0.0125 |
0.9375 |
0 1 0 0 0 0 1 (33 ) |
0.7 + (73.0 - 33.0) * 0.0125 |
1.2000 |
Seems to be pulling a VID value of 27, which according to the formula is a higher than expected voltage for this CPU.
Will verify the VID MSR later tonight, along with potentially a measurement of Vcc at the VRM.
https://github.com/cyring/CoreFreq/blob/bf8f6d2f1e51f12495358b3e818f8e3590ab9e4a/corefreqk.h#L1491
Atom Bonnel is routed to a compatible Core2 loop :
https://github.com/cyring/CoreFreq/blob/bf8f6d2f1e51f12495358b3e818f8e3590ab9e4a/corefreqk.c#L13601
where VID
is read from MSR_IA32_PERF_CTL
https://github.com/cyring/CoreFreq/blob/bf8f6d2f1e51f12495358b3e818f8e3590ab9e4a/corefreqk.c#L13626
MSR is specified for classes of architecture
https://github.com/cyring/CoreFreq/blob/bf8f6d2f1e51f12495358b3e818f8e3590ab9e4a/intelmsr.h#L546
Probably Atom Bonnel is a different bit layout ...
Or another MSR to query VID
from ?
Intel Atom N450 CoreFreq, lspci, and /proc/cpuinfo
I haven't had luck tracing an appropriate MSR so far. Intel does not do a good job of describing MSR_IA32_PERF_CTL in the software developers manual for these CPU's.
Intel Atom N450 CoreFreq, lspci, and /proc/cpuinfo
I haven't had luck tracing an appropriate MSR so far. Intel does not do a good job of describing MSR_IA32_PERF_CTL in the software developers manual for these CPU's.
TSC
is poorly variant.You must start driver based on a hard coded BCLK
as below:
insmod corefreqk.ko AutoClock=0
Then you can monitor CPU frequencies again.
VID
whereas there is none at Atom 450
Atom 330 has a VID
of 27
only in the use case of the screenshot.@svmlegacy
To avoid the side effect of the variant TSC with the Intel Atom N450, I recommend to start CoreFreq with the AutoClock=0
parameter. Please see previous comment.
@svmlegacy : Thinking about TSC
, I would like to enhance the driver to let it handle the variant
case by itself.
I just need your Atom N450 for future code testing; if it's ok for you ?
@svmlegacy : Thinking about
TSC
, I would like to enhance the driver to let it handle thevariant
case by itself. I just need your Atom N450 for future code testing; if it's ok for you ?
Atom N450 Results updated here with AutoClock=0 : Intel Atom N450
Yes, no problem to use this machine for future testing. I'll keep it available.
Please let me know, show me, what is missing from last develop
branch ?
Hello, sorry about the delay;
Here's the current output for the Atom N450: here
I still need to start the kernel module with AutoClock=0.
Hello, sorry about the delay;
Here's the current output for the Atom N450: here
I still need to start the kernel module with AutoClock=0.
Indeed the AutoClock
parameter is not programmed to switch when facing the Variant TSC case. Because those processors are less occurring, I let the User set it to OFF.
There is still some to do, and not the easiest, if feasible:
I/O
TCO
Virt
PL
Power LimitersHello,
The attached version is an attempt to compute the DIMM geometry on your N450
(Bus rate & speed unit is btw changed to MT/s)
Could you please show me the output of corefreq-cli -M
About Vcore, I would also need the following outputs from your N450
:
## MSR:IA32_PERF_STATUS
rdmsr -ax 0x198
## MSR:IA32_PERF_CTL
rdmsr -ax 0x199
Clean make of main branch. Inserting corefreqk.ko module results in hard lock of this system, even num lock frozen. Have also seen this issue on select Intel ES processors, on unreleased steppings.