cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.97k stars 126 forks source link

[SOLVED] Clock marked unstable after Changing PState TGT? #308

Closed h1z1 closed 2 years ago

h1z1 commented 2 years ago

Fully acknowledge this may be a kernel bug but given it happened with corefreq can't hurt to start here..

Short version is I had a host with a system clock that was hours out of sync (unrelated to corefreq). Ran corefreq for shits'n giggles which appeared to be fine until trying to change the pstate target.. whole box acted like it did a clock jump and the kernel has permanently marked tsc as unstable

[1346249.931484] clocksource: timekeeping watchdog on CPU17: Marking clocksource 'tsc' as unstable because the skew is too large:
[1346249.931486] clocksource:                       'hpet' wd_now: 9c5d0440 wd_last: 9bf64291 mask: ffffffff
[1346249.931487] clocksource:                       'tsc' cs_now: 103b19f57d0eb5 cs_last: fc461fd0ba3a6 mask: ffffffffffffffff
[1346249.931490] tsc: Marking TSC unstable due to clocksource watchdog
[1346249.931508] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[1346249.931510] sched_clock: Marking unstable (1346250036096440, -101921334)<-(1346250203079464, -271578861)
[1346249.945415] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode
[1346249.983687] clocksource: Switched to clocksource hpet

From what I gather that is coming from https://github.com/torvalds/linux/blob/e7c124bd04631973a3cc0df19ab881b56d8a2d50/kernel/time/clocksource.c#L408-L409

Have to reboot the host because hpet is godawful on zen. What I don't understand though is why it triggered? Sure the clock was way out but the host was running fine.

cyring commented 2 years ago

... Sure the clock was way out but the host was running fine.

Because the Kernel loops per jiffies need to calibrated again

In this Wiki page, I'm gathering the README instructions:

CoreFreq as the Clock Source, CPU Freq and CPU Idle driver

At that point CoreFreq is mastering and no other drivers should have switch Processor features on; you will drive from the various Client windows; like:

h1z1 commented 2 years ago

Not sure I follow, TGT was changed from within Corefreq :)

cyring commented 2 years ago

Not sure I follow, TGT was changed from within Corefreq :)

Great. You got no issue logged in kernel log ?

By the way, could you confirm if my provided code fix is working with your Atom ? Let me know if my instructions are unclear.

h1z1 commented 2 years ago

Great. You got no issue logged in kernel log ? Nothing besides the messages above.

By the way, could you confirm if my provided code fix is working with your Atom? Let me know if my instructions are unclear.

Not sure what that is in reference to, I don't have an atom :) If you mean the wiki page above, the drivers are loaded. Might want to add a note about cpuidle_sysfs_switch.

It will enable changing the driver while booted with cpuidle/current_governor

Looks like they removed the option because it's default now o_O So much for deprecation warnings sigh.. that parameter's behaviour has been like that since ... 2.6.22 .. 14 years

That host is not running 5.4.70

cyring commented 2 years ago

Great. You got no issue logged in kernel log ? Nothing besides the messages above.

By the way, could you confirm if my provided code fix is working with your Atom? Let me know if my instructions are unclear.

Not sure what that is in reference to, I don't have an atom :) If you mean the wiki page above, the drivers are loaded. Might want to add a note about cpuidle_sysfs_switch.

Oh sorry, I made a confusion among issues

It will enable changing the driver while booted with cpuidle/current_governor

Looks like they removed the option because it's default now o_O So much for deprecation warnings sigh.. that parameter's behaviour has been like that since ... 2.6.22 .. 14 years

That host is not running 5.4.70

I will check for cpuidle_sysfs_switch, especially if it can be directly managed from my kernel module

cyring commented 2 years ago

I have gone through cpuidle_sysfs_switch but considering current kernel version which comes without safe exported functions to achieve the same, I'm postponing the enhancement.

cyring commented 2 years ago

2022-04-20-060642_644x704_scrot This is what you will get in the develop branch: you switch the Clock Source from the box.