acidanthera / bugtracker

Acidanthera Bugtracker
383 stars 44 forks source link

ART Frequency, CPU Frequency and Clock Drift #448

Closed mrmiller closed 5 years ago

mrmiller commented 5 years ago

OCCPU is misreporting my CPU's frequency. I believe it's also causing the macOS system clock to run fast:

00:226 00:113 OCCPU: Found Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
00:345 00:118 OCCPU: Signature 50654 Stepping 4 Model 55 Family 6 Type 0 ExtModel 5 ExtFamily 0
00:461 00:116 OCCPU: Detected Apple Processor Type: 0F -> 0F01
00:575 00:114 OCCPU: Ratio Min 12 Max 30 Current 36 Turbo 37 36 36 36
00:695 00:119 OCCPU: ART Frequency  2880000000  2880MHz 24000000 * 240 / 2 = 2880000000
00:821 00:126 OCCPU: TSC Frequency  2992972839  2992MHz
00:944 00:122 OCCPU: CPU Frequency  2880000000  2880MHz
01:068 00:123 OCCPU: FSB Frequency    96000000    96MHz
01:206 00:138 OCCPU: Pkg 1 Cores 12 Threads 24

Looking at OcCpu.c, the following lines look suspect to me:

AsmCpuid (CPUID_TIME_STAMP_COUNTER, &CpuidEax, &CpuidEbx, NULL, NULL);
if (CpuidEax > 0 && CpuidEbx > 0) {
  Cpu->CPUFrequency = MultU64x32 (BASE_ART_CLOCK_SOURCE, (UINT32) DivU64x32 (CpuidEbx, CpuidEax));

https://github.com/acidanthera/OcSupportPkg/blob/3e98de1f89a9e946f026ecc24e4856e2096dc338/Library/OcCpuLib/OcCpuLib.c#L667

Should that be storing to Cpu->ARTFrequency instead? I tested by making this change and setting Cpu->CPUFrequency from Cpu->TSCFrequency. About this Mac now reports the correct frequency (3Ghz instead of 2.88Ghz). Also, the clock no longer drifts whereas it used to run a few seconds per minute faster than real-time.

vit9696 commented 5 years ago

ARTFrequency is a constant normally, and I think the presence of this field in OC_CPU_INFO is a bug in the first place. The issue is most likely in the wrong calculation order considering how XNU kernel does it. Could you try changing the following line:

Cpu->CPUFrequency = MultU64x32 (BASE_ART_CLOCK_SOURCE, (UINT32) DivU64x32 (CpuidEbx, CpuidEax));

with

Cpu->CPUFrequency = MultThenDivU64x64x32 (BASE_ART_CLOCK_SOURCE, CpuidEbx, CpuidEax);

You will need to use this function (taken from PciRootBridge.c).

/**
  Return the result of (Multiplicand * Multiplier / Divisor).

  @param Multiplicand A 64-bit unsigned value.
  @param Multiplier   A 64-bit unsigned value.
  @param Divisor      A 32-bit unsigned value.
  @param Remainder    A pointer to a 32-bit unsigned value. This parameter is
                      optional and may be NULL.

  @return Multiplicand * Multiplier / Divisor.
**/
UINT64
MultThenDivU64x64x32 (
  IN      UINT64                    Multiplicand,
  IN      UINT64                    Multiplier,
  IN      UINT32                    Divisor,
  OUT     UINT32                    *Remainder  OPTIONAL
  )
{
  UINT64                            Uint64;
  UINT32                            LocalRemainder;
  UINT32                            Uint32;
  if (Multiplicand > DivU64x64Remainder (MAX_UINT64, Multiplier, NULL)) {
    //
    // Make sure Multiplicand is the bigger one.
    //
    if (Multiplicand < Multiplier) {
      Uint64       = Multiplicand;
      Multiplicand = Multiplier;
      Multiplier   = Uint64;
    }
    //
    // Because Multiplicand * Multiplier overflows,
    //   Multiplicand * Multiplier / Divisor
    // = (2 * Multiplicand' + 1) * Multiplier / Divisor
    // = 2 * (Multiplicand' * Multiplier / Divisor) + Multiplier / Divisor
    //
    Uint64 = MultThenDivU64x64x32 (RShiftU64 (Multiplicand, 1), Multiplier, Divisor, &LocalRemainder);
    Uint64 = LShiftU64 (Uint64, 1);
    Uint32 = 0;
    if ((Multiplicand & 0x1) == 1) {
      Uint64 += DivU64x32Remainder (Multiplier, Divisor, &Uint32);
    }
    return Uint64 + DivU64x32Remainder (Uint32 + LShiftU64 (LocalRemainder, 1), Divisor, Remainder);
  } else {
    return DivU64x32Remainder (MultU64x64 (Multiplicand, Multiplier), Divisor, Remainder);
  }
}
mrmiller commented 5 years ago

Hmm... are you sure? The numbers from the calculation are fortunately in the log (kudos to whoever did that!). 240000000*240/2 isn’t going to overflow at the multiplication step nor is the division fractional. I don’t think order of operations is going to matter at all unless I’m missing something?

mrmiller commented 5 years ago

I haven’t studied the rest of the code but looking at the XNU source you linked, they are setting the TSC frequency based on that calculation, not the CPU frequency. Maybe that’s the issue?

vit9696 commented 5 years ago

I see now the values, and they look crazy. Let me clarify things a little though, because there is a lot of confusion here in addition to code being somewhat wrong as well.

XNU kernel expects us pass up to two parameters ARTFrequency and FSBFrequency:

Notes for OpenCore OC_CPU_INFO fields:

So basically there should be an equation:

CPUFrequencyFromART ≈ CPUFrequencyFromTSC ≈ CPUFrequency
  ≈ FSBFrequency * MaxBusRatio

After the refactoring the names and fixing CPUFrequency assignment, for your case it will still be broken, because the calculation uses the wrong values for your CPU. I checked Intel SDM and this is what it says about ART in 17-44 Vol. 3B and 18-128 Vol. 3B:

17.17.4 Invariant Time-Keeping

The invariant TSC is based on the invariant timekeeping hardware (called
Always Running Timer or ART), that runs at the core crystal clock frequency.
The ratio defined by CPUID leaf 15H expresses the frequency relationship
between the ART hardware and TSC.

If CPUID.15H:EBX[31:0] != 0 and CPUID.80000007H:EDX[InvariantTSC] = 1,
the following linearity relationship holds between TSC and the ART hardware:

TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] )/ CPUID.15H:EAX[31:0] + K

Where 'K' is an offset that can be adjusted by a privileged agent [2].
When ART hardware is reset, both invariant TSC and K are also reset.

[2] IA32_TSC_ADJUST MSR and the TSC-offset field in the VM execution
controls of VMCS are some of the common interfaces that privileged
software can use to manage the time stamp counter for keeping time
18.7.3 Determining the Processor Base Frequency

For Intel processors in which the nominal core crystal clock frequency is
enumerated in CPUID.15H.ECX and the core crystal clock ratio is encoded
in CPUID.15H (see Table 3-8 “Information Returned by CPUID Instruction”),
the nominal TSC frequency can be determined by using the following equation:

Nominal TSC frequency = ( CPUID.15H.ECX[31:0] * CPUID.15H.EBX[31:0] ) ÷
  CPUID.15H.EAX[31:0]

For Intel processors in which CPUID.15H.EBX[31:0] ÷ CPUID.0x15.EAX[31:0]
is enumerated but CPUID.15H.ECX is not enumerated, Table 18-75 can be used
to look up the nominal core crystal clock frequency.

Table 18-75. Nominal Core Crystal Clock Frequency

Processor Families/Processor    | Nominal Core Crystal
Number Series [1]               | Clock Frequency
------------------------------------------------------
Intel® Xeon® Processor          |
Scalable Family with CPUID      | 25 MHz
signature 06_55H.               |
------------------------------------------------------
6th and 7th generation Intel®   |
Core™ processors and Intel®     | 24 MHz
Xeon® W Processor Family.       |
------------------------------------------------------
Next Generation Intel® Atom™    |
processors based on Goldmont    | 19.2 MHz
Microarchitecture with CPUID    |
signature 06_5CH (does not      |
include Intel Xeon processors). |

[1] For any processor in which CPUID.15H is enumerated and
MSR_PLATFORM_INFO[15:8] (which gives the scalable bus frequency) is
available, a more accurate frequency can be obtained by using CPUID.15H.
Time Stamp Counter and Nominal Core Crystal Clock Information Leaf (15H)

NOTES:
If EBX[31:0] is 0, the TSC/”core crystal clock” ratio is not enumerated.
EBX[31:0]/EAX[31:0] indicates the ratio of the TSC frequency and the core crystal clock frequency.
If ECX is 0, the nominal core crystal clock frequency is not enumerated.
“TSC frequency” = “core crystal clock frequency” * EBX/EAX.
The core crystal clock may differ from the reference clock, bus clock, or core clock frequencies.

EAX Bits 31 - 00: An unsigned integer which is the denominator of the TSC/”core crystal clock” ratio.
EBX Bits 31 - 00: An unsigned integer which is the numerator of the TSC/”core crystal clock” ratio.
ECX Bits 31 - 00: An unsigned integer which is the nominal frequency of the core crystal clock in Hz.
EDX Bits 31 - 00: Reserved = 0.

So, while this makes things very ugly for us, it should still be doable:

  1. We should rename all the things I mentioned above.
  2. We should check for 06_55H, 06_55H, and CPUID 15H ART frequency values before falling back to 24 MHz
  3. In addition I noticed that XNU kernel does not care about MSR_IA32_TSC_ADJUST (0x3B) at all, whereas Linux kernel does. Please print MSR_IA32_TSC_ADJUST on your machine and check whether it is 0 or not.

It will be great if you could prepare a draft patch as we do not have the hardware to properly test the changes. Thanks!

mrmiller commented 5 years ago

Wow, thanks for the very detailed explanation and the deep dive! I’ll get started on a patch and should have something for you later on today.

mrmiller commented 5 years ago

Made a draft of the changes and tested it on my setup. MSR_IA32_TSC_ADJUST is indeed 0 but I included it in the calculation as suggested by the Intel documentation. acidanthera/OcSupportPkg#8

vit9696 commented 5 years ago

Thanks, replied in the pull request!

vit9696 commented 5 years ago

Alright, this is now merged in master, and I also ensured that ARTFrequency is properly passed to the system when it is different from 24 MHz. Please report that everything works fine and close this issue. Thanks a lot for your help & patience!

mrmiller commented 5 years ago

Seems to be working! I'll keep an eye on long-term clock drift, as it may be an unrelated issue.

mrmiller commented 5 years ago

After a longer evaluation, the system clock is now running slower than realtime. It drifted about 2 minutes behind over 24 hours. Any suggestions for where to start looking? I can file a separate issue for it and work on a patch.

vit9696 commented 5 years ago

Hmmm, that is likely a bug in XNU. I really wonder whether XNU is can work with 25 MHz crystal clock. Could you try leaving 24 MHz ARTFrequency as is by e.g. removing this change: https://github.com/acidanthera/OcSupportPkg/commit/2657d54cb97d1044e6ff9c707ee2d65c539ae42a#diff-b50a73be84589e729c9f6c0e5f8159d7R283?

mrmiller commented 5 years ago

Hmmm, that is likely a bug in XNU. I really wonder whether XNU is can work with 25 MHz crystal clock. Could you try leaving 24 MHz ARTFrequency as is by e.g. removing this change: acidanthera/OcSupportPkg@2657d54#diff-b50a73be84589e729c9f6c0e5f8159d7R283?

I'll give it a try and let you know.

vit9696 commented 5 years ago

@mrmiller I reopened this. Any news on the matter?

mrmiller commented 5 years ago

Yes, ran for long stretches with the lines commented out. Is there any way to verify from the OS what ARTFrequency is set to, just to confirm I didn’t do something stupid?

I’m seeing the same drift, with the system clocking running about 2 min/day slower than real-time. This seems to be the same as with those lines in place. This behavior is slightly different from before any of these changes, though, where I believe the clock was running faster than real-time, and at a faster rate.

vit9696 commented 5 years ago

Currently we have a belief that FSBFrequency is not used at all on Skylake and newer, and actually ARTFrequency is the one that is relevant, and changing it causes the problems now. You can find ARTFrequency and/or FSBFrequency in IODeviceTree:/efi/platform (with e.g. IORegistryExplorer.app).

vit9696 commented 5 years ago

Actually, I think this is relevant:

https://lore.kernel.org/lkml/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com/

While SKX servers do have a 25  MHz crystal, but they too have a problem.
All SKX subject the crystal to an EMI reduction circuit that
reduces its actual frequency by (approximately) -0.25%.
This results in -1 second per 10 minute time drift
as compared to network time.
mrmiller commented 5 years ago

Currently we have a belief that FSBFrequency is not used at all on Skylake and newer, and actually ARTFrequency is the one that is relevant, and changing it causes the problems now. You can find ARTFrequency and/or FSBFrequency in IODeviceTree:/efi/platform (with e.g. IORegistryExplorer.app).

Confirmed that whether ARTFrequency is set to 25 Mhz or not present at all, the behavior is the same. Slower by approx. 2 min/day.

While SKX servers do have a 25  MHz crystal, but they too have a problem.
All SKX subject the crystal to an EMI reduction circuit that
reduces its actual frequency by (approximately) -0.25%.
This results in -1 second per 10 minute time drift
as compared to network time.

Those numbers do seem to line up with the drift I'm seeing. Do you think there's a solution at this point or just out of luck?

vit9696 commented 5 years ago

Well, it will work fine if you set ARTFrequency to 24937500 (25 MHz - 0.25%) or calculate it from measured TSCFrequency. Currently I am a bit at a loss how actually we can workaround this design wise. My primary thought is that we can measure ARTFrequency just like TSCFrequency on Xeon W Server, but the question is, does it only apply to Xeon W Server or more CPUs as the patch description is not perfectly clear, and I cannot find any reference for that "EMI reduction circuit".

I will try looking for more information this week, let me know if you find anything else as well. Also check the 24937500 value.

vit9696 commented 5 years ago

Ok, I googled a bit more, and it looks like this thing is configurable. Have a look on the bugreport preceding the hardcoded frequency removal: https://bugzilla.kernel.org/show_bug.cgi?id=197299

There is a reference to spread spectrum state in BIOS, which I believe can be related to this optin description: https://www.techarp.com/bios-guide/mclk-spread-spectrum/

Could you please check your firmware settings and look for something similar?

mrmiller commented 5 years ago

Ok, I googled a bit more, and it looks like this thing is configurable. Have a look on the bugreport preceding the hardcoded frequency removal: https://bugzilla.kernel.org/show_bug.cgi?id=197299

Hmm... that's very interesting. It seems like they settled on basing it off the TSC timer instead of hard-coding the 25 Mhz. I should probably see what value that gets and if it's close to the 25 Mhz - 0.25% value.

There is a reference to spread spectrum state in BIOS, which I believe can be related to this optin description: https://www.techarp.com/bios-guide/mclk-spread-spectrum/

Could you please check your firmware settings and look for something similar?

I think that particular clock spread spectrum is specific to memory timing; maybe we're looking for a different clock spread spectrum? Same principle and motivation, presumably. I found some spread spectrum settings in my BIOS ROM but they're all hidden so I'll modify that and give it a spin.

Given the Linux kernel's approach, though, I wonder if it's worth modifying our approach to filling ARTFrequency in this case. Their solution relies on not setting X86_FEATURE_TSC_KNOWN_FREQ in their fallback case which results in an additional timer calibration step.

For our purposes, I'm proposing modifying step 4 from our original scheme:

4. On failure check for `CPUID_PROCESSOR_FREQUENCY` availability and use it like Linux.

Instead, we should calculate ARTFrequency from CPUFrequencyFromTSC. We'd then set CPUFrequencyFromART with the reported frequency. Doing this by hand with my system as an example:

CPUFrequencyFromTSC = 2992968276 Hz
TSC Ratios from CPUID_TIME_STAMP_COUNTER = 240 / 2
Actual ARTFrequency = 25 MHz
Estimated ARTFrequency = 25 Mhz * 99.75% =  24.9375 MHz
Proposed ARTFrequency = 2992968276 Hz * 2 / 240 = 24.941402 MHz
Proposed CPUFrequencyFromART = CPUID_PROCESSOR_FREQUENCY = 3 GHz

It's in the ballpark of that estimated -0.25% spread spectrum hit. I've got this patch mocked up so I'll give it a spin tomorrow and see how it holds up.

vit9696 commented 5 years ago

I believe that the exact option I found is likely to be different, but I expect this to be named similarly if it exists. Let me know whether the options you found actually do anything. It is probably relatively easy to test them by measuring TSC.

I do not like the idea of measuring ARTFrequency through TSC, but I do like the idea of not using CPUID_PROCESSOR_FREQUENCY. This makes good sense to me, as Intel SDM explicitly marks it as marketing frequency and recommends to avoid using it for any other purposes.

However, there are two issues here:

This situation makes it kind of problematic, so I am leaning to special-case just the Xeon W family and not the rest until more information is known.

mrmiller commented 5 years ago

I do not like the idea of measuring ARTFrequency through TSC

That does seem to be what Linux is doing, though, right? They use CPUID_PROCESSOR_FREQUENCY to calculate the ideal ARTFrequency, which is what we do currently and correctly gives 25 MHz. But then, they have it measure the timer frequency by not setting X86_FEATURE_TSC_KNOWN_FREQ.

I was under the impression that the ratios in CPUID_TIME_STAMP_COUNTER are exactly for converting between ART and TSC. Given the idealized ART isn't accurate, it seems to make sense to adjust it based on a measured value of some sort.

This situation makes it kind of problematic, so I am leaning to special-case just the Xeon W family and not the rest until more information is known.

It's not explicit, but that's currently how I believe it behaves implicitly. The Skylake and Kabylake CPUS should both never make it to step 4, because they are special-cased to 24 MHz in step 3 (see this comment).

vit9696 commented 5 years ago

Right, I partially forgot about existing value table. Well, okay, if this approach works I am fine to merge it.

mrmiller commented 5 years ago

Right, I partially forgot about existing value table. Well, okay, if this approach works I am fine to merge it.

Alternatively, we could treat this as a refinement step for CPUs that have an idealized 25 MHz ART frequency, e.g. step 4.5. That would theoretically limit it only to the server tier chips and wouldn't cause issues with the next gen after Kaby Lake where we'd need to amend the known values table in step 3 or it would be using the TSC-derived ART by default.

vit9696 commented 5 years ago

Next generation currently has 0 TSC frequency, so as long as we handle it, it will most likely work fine.

mrmiller commented 5 years ago

If that's the case, then the TSC-derived ARTFrequency should be 0, which would mean we'd fallback to the default 24 MHz in step 5.

vit9696 commented 5 years ago

Good, then it should work fine.

vit9696 commented 5 years ago

@mrmiller we are going to make a release by the end of this week, will you submit a patch on this and does the discussed approach work for you?

mrmiller commented 5 years ago

I’m still testing my patch but will let you know tomorrow if it’s working.

mrmiller commented 5 years ago

Okay, so the good news is that with my patch to calculate the adjusted ARTFrequency based on the measured TSC frequency, I no longer see a large time drift: sub-1 minute after 48 hours. I'm just going by the system clock time in the corner vs a phone so I don't have accuracy down to seconds but it's certainly much improved.

I'll create a pull request later today. I'm not sure it needs to be rushed into the upcoming release, as it probably only really affects me at the moment.

vit9696 commented 5 years ago

@mrmiller glad to hear that! I also think that we do not particularly need to rush it, but if it is ready I am fine to merge it as well.