cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.99k stars 126 forks source link

Daemon Segmentation Fault on X570 + Ryzen 3 #186

Closed olejon closed 4 years ago

olejon commented 4 years ago

I updated CoreFreq a couple of days ago using my automatic script and it worked just fine.

Now I did it again, and loading the module works but the daemon does not load and the UI shows nothing.


# corefreqd -d

CoreFreq Daemon 1.78.1  Copyright (C) 2015-2020 CYRIL INGENIERIE

Processor [AMD Ryzen 5 3600X 6-Core Processor]

Architecture [Zen2/Matisse] 12/12 CPU Online.

SleepInterval(1000), SysGate(2000), 2326 tasks

    Thread [7fa23dc86700] Init CHILD 000
    Thread [7fa23d485700] Init CHILD 001
    Thread [7fa23cc84700] Init CHILD 002
    Thread [7fa237fff700] Init CHILD 003
    Thread [7fa22ffff700] Init CHILD 004
    Thread [7fa235ffb700] Init CHILD 008
    Thread [7fa2367fc700] Init CHILD 007
    Thread [7fa236ffd700] Init CHILD 006
    Thread [7fa234ff9700] Init CHILD 010
    Thread [7fa22f7fe700] Init CHILD 011
    Thread [7fa2377fe700] Init CHILD 005
    Thread [7fa2357fa700] Init CHILD 009

Segmentation fault

Setup is same as before when you asked about my entire setup. I see you did some changes recently.

cyring commented 4 years ago

More than 195W is scarry and with 82°C, I wonder when thermal throatling is going to engage ? I'm gonna review the driver to see if I had set a thermal sticky bit into code, which translates into HOT being highlighted.

olejon commented 4 years ago
cyring commented 4 years ago

Thanks for your temperature calibration effort. Can't wait to see your results.

Meanwhile I've checked code and driver has no ThermalTrip query in families 17h & 18h. Thus no HOT will appear with Zen processors. Perhaps kernel is logging it. I will search for such thermal status register which may be queried onto SMU

olejon commented 4 years ago

Many bullet points. Please read them all.

Almost Idle

Image

During Blender Benchmark

Image


Both Red Rectangles say the same

Image

cyring commented 4 years ago

Very interesting investigation you did, although I will go very slowly on those settings t'il we have a clear understanding on what is going on, and their side effects. I'm still not sure if hardware is well protected against over-voltage, over-consumption and so on. Don't fry your brand new Ryzen !

We find wanted registers a bit every-where, sometimes through :

olejon commented 4 years ago

Don't fry your brand new Ryzen !

olejon commented 4 years ago

This does NOT seem to discuss that setting but rather manual overclocking in the AI Tweaker part of the UEFI, although maybe interesting for you? Remember to see the 2 screenshots below too.

"OC mode" is automatically engaged when P0 frequency is raised above base frequency. In OC mode, the SMU inside Ryzen CPU is disabled, which means sophisticated power management and power saving features are not functional. The presense of OC mode can be verified by calling GetCurrentOCMode() via the RyzenMaster SDK or code 0C via a Port 80 debugger card. HWiNFO has 3 voltage readings related to CPU. If it's individual core VID or motherboard Vcore that is scaling down to 0.2 Volts, then that's expected. What's important is "CPU Core Voltage (SVI2 TFN)". I think this value can tell whether the CPU has gone into "OC mode" or not. If this value is not scaling down, then it most likely means OC mode is engaged

Screenshots from 2 Manuals for ASRock MBs. Not much info:



Image

Image

cyring commented 4 years ago
  • Read the above first

I did but I don't feel comfortable with the power consumed. 173-203 are far too high to my taste See my way in next post

  • I can only find ASRock documenting "Uncore OC Mode" and it seems to have to do with RAM and Infinity Fabric only

I'm not familiar with the Zen Uncore to tweak it yet. Especially its voltage limit.

  • I've not touched any of these manual settings, just turned off Gear Down which should be totally safe, and another Power Saving setting for RAM which is a sub-menu under that part again

I've also noticed this one and I don't know what's behind the hood. For now I have stabilized my DRAM at its OC factory mode and it will be easier to understand those extra settings when I will have a clear understanding of how the Ryzen Memory controller manages the [sub]-timings . So far, I have no specs how to query the configured DDR speed, geometry, latency, and so on

  • Still system performs better. So is it simply the fact that it now actually uses my RAM to the max as D.C.O.P. = Enabled or what?
  • All people discussing this setting also do manual overclocking (entering values manually) but I don't

I have chosen D.C.O.P(XMP) to auto configure the G.Skill kit.

"OC mode" is automatically engaged when P0 frequency is raised above base frequency. In OC mode, the SMU inside Ryzen CPU is disabled, which means sophisticated power management and power saving features are not functional. The presense of OC mode can be verified by calling GetCurrentOCMode() via the RyzenMaster SDK or code 0C via a Port 80 debugger card. HWiNFO has 3 voltage readings related to CPU. If it's individual core VID or motherboard Vcore that is scaling down to 0.2 Volts, then that's expected. What's important is "CPU Core Voltage (SVI2 TFN)". I think this value can tell whether the CPU has gone into "OC mode" or not. If this value is not scaling down, then it most likely means OC mode is engaged

Screenshots from 2 Manuals for ASRock MBs. Not much info:

  • PS: I've NOT touched this voltage!

Those Windows tools are so lucky to get so many specs details from Manufacturers I'm trying to stick to Linux and to offer an Open Source software. Road is long.

olejon commented 4 years ago

With both set to Auto

Image


Set to Enabled and 100MHz

Image


This is UEFI setting alone causes that big difference in RM's Max Values, not that they are reached at all

Image


Linux

Image


Windows

Image

cyring commented 4 years ago

Power tuning

BIOS settings: all [AUTO] beside a few

200622120949

Day to day working state

olejon commented 4 years ago

That came quick! Remember to see the one I just commented.

Then we only have 2 differences in UEFI:


Temperature stays at 50°C

Image

Temperature eventually reaches 80°C

Image

Idle less than 1V - 33°C with Ambient 27°C

Image

cyring commented 4 years ago

How CoreFreq is reading a Vcore of 0.2V : is it a voltage you set in BIOS ?

olejon commented 4 years ago

You mean the one at bottom right? Saying V[0.20]. Is that bad or anything?

I'm not into Voltage tuning AT ALL.

I haven't set ANY integer or string manually, only Drop-Downs. Only the D.C.O.P. changes automatically what you already know:

  1. RAM voltage to 1.35
  2. RAM timings, for me 16-18-18-36
  3. RAM frequency for me 3200 MHz

Yours show V[0.91]. What's the diff?

cyring commented 4 years ago

You mean the one at bottom right? Saying V[0.20]. Is that bad or anything?

I'm not into Voltage tuning AT ALL.

I haven't set ANY integer or string manually, only Drop-Downs. Only the D.C.O.P. changes automatically what you already know:

  1. RAM voltage to 1.35
  2. RAM timings, for me 16-18-18-36
  3. RAM frequency for me 3200 MHz

Yours show V[0.91]. What's the diff?

OK, so it's a CoreFreq bug Adding it to the todo list

olejon commented 4 years ago
olejon commented 4 years ago

Hmm, look at previous IDLE Screenshot:

https://github.com/cyring/CoreFreq/issues/186#issuecomment-646917760

cyring commented 4 years ago

Manual OC

olejon commented 4 years ago

I suppose :man_shrugging: :smile:

Differences:

cyring commented 4 years ago

I suppose

  • You're using the CoreFreq CPU driver, no?

Yes all P-State tweaks done with CoreFreq. But I've not programmed a governor for AMD yet. Thus OSPM is under the control of the Linux ACPI module.

  • You have CPB 2 places in your UEFI? Maybe I do too but haven't looked, just found it 1 place, but I know it hasn't been touched (tried disabling it once as said and everything went slower indeed)

Yes, the Crosshair is also showing this CPB issue, 2 places for the same name. It's confusing.

  • BTW, as said my Voltage at Idle was 0.91V as yours before enabling Uncore OC Mode
  • When it now says 0.2V, which Value in Ryzen Master should I look at? Tell me the name from the Screenshots of Ryzen Master

I believe it is called CPU Voltage under Voltage Control

Differences:

  • You have faster RAM
  • Your MB is same brand but costs twice as much so it would be weird if you don't get better results than me with the same settings (you do, I must have Uncore OC Mode Enabled to get same Blender Benchmark results)
  • Newer kernel (my experience = better performance with every version for Ryzen Gen 3)
  • As said you use the CoreFreq CPU driver(?)

Perhaps but the Processor P-States remain what they are, whatever the bells and whistles of the motherboard. My goal here is to control the P-States: next steps will be to set some manual frequency ratios in BIOS then dump the registers changes within CoreFreq driver to understand how BIOS is stabilizing frequencies...

cyring commented 4 years ago

You mean the one at bottom right? Saying V[0.20]. Is that bad or anything? I'm not into Voltage tuning AT ALL. I haven't set ANY integer or string manually, only Drop-Downs. Only the D.C.O.P. changes automatically what you already know:

  1. RAM voltage to 1.35
  2. RAM timings, for me 16-18-18-36
  3. RAM frequency for me 3200 MHz

Yours show V[0.91]. What's the diff?

OK, so it's a CoreFreq bug Adding it to the todo list

Checking code, threshold is 0.15V and it applies only to the lowest voltage when reading falls down to zero. Also threshold is not applied to current value (middle column) So it might not a bug and the voltage has been read at 0.2 for some reasons. https://github.com/cyring/CoreFreq/blob/ab750ba6df881c4a8e959797a23689800f54ce19/corefreq.h#L492

olejon commented 4 years ago
olejon commented 4 years ago

Image

olejon commented 4 years ago

The 0.20V screenshot was a "lucky one":

So it's kind of like Ryzen Master, only that RM shows way more levels between 1-3 (and many more decimals, not that that matters).

I'm back to the OC mode. One can set a maximum temperature in AI Tweaker > Precision Boost Override too say avoid it never reaches 90°C, which I guess is when the Orange Color shows, but I have never reached above 89°C, as said, and that was just briefly before down again to 87-88, and only during Blender Benchmark fishy_cat.

olejon commented 4 years ago

I tend to check the ASUS website if there is a BIOS update. Saw there was one from June 17 published. Probably one for you too?

juni 24 00:52:21 tux kernel: do_IRQ: 1.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #2
juni 24 00:52:21 tux kernel: do_IRQ: 2.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #3
juni 24 00:52:21 tux kernel: do_IRQ: 3.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #4
juni 24 00:52:21 tux kernel: do_IRQ: 4.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #5
juni 24 00:52:21 tux kernel: do_IRQ: 5.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #6
juni 24 00:52:21 tux kernel: do_IRQ: 6.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #7
juni 24 00:52:21 tux kernel: do_IRQ: 7.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #8
juni 24 00:52:21 tux kernel: do_IRQ: 8.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #9
juni 24 00:52:21 tux kernel: do_IRQ: 9.55 No irq handler for vector
juni 24 00:52:21 tux kernel:  #10
juni 24 00:52:21 tux kernel: do_IRQ: 10.55 No irq handler for vector

...

juni 24 00:52:21 tux kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP8._DSM], AE_ALREADY_EXISTS (20190703/dswload2-327)
juni 24 00:52:21 tux kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190703/psobject-221)
PRIME X570-PRO BIOS 2203
Update AMD AM4 AGESA PI 1.0.0.1
cyring commented 4 years ago

I tend to check the ASUS website if there is a BIOS update. Saw there was one from June 17 published. Probably one for you too?

Thanks for the tips

  • After installing it, Linux boots fine, but see these and journalctl shows the IRQ stuff in Red
juni 24 00:52:21 tux kernel: do_IRQ: 1.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #2
juni 24 00:52:21 tux kernel: do_IRQ: 2.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #3
juni 24 00:52:21 tux kernel: do_IRQ: 3.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #4
juni 24 00:52:21 tux kernel: do_IRQ: 4.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #5
juni 24 00:52:21 tux kernel: do_IRQ: 5.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #6
juni 24 00:52:21 tux kernel: do_IRQ: 6.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #7
juni 24 00:52:21 tux kernel: do_IRQ: 7.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #8
juni 24 00:52:21 tux kernel: do_IRQ: 8.55 No irq handler for vector
juni 24 00:52:21 tux kernel:   #9
juni 24 00:52:21 tux kernel: do_IRQ: 9.55 No irq handler for vector
juni 24 00:52:21 tux kernel:  #10
juni 24 00:52:21 tux kernel: do_IRQ: 10.55 No irq handler for vector

...

juni 24 00:52:21 tux kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP8._DSM], AE_ALREADY_EXISTS (20190703/dswload2-327)
juni 24 00:52:21 tux kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190703/psobject-221)
  • Any ideas?
  • The BIOS update seems to be AMD only stuff. You know what this is?
  • After install it said it was updating the "Led firmware" and nothing more before booting
  • Changelog:
PRIME X570-PRO BIOS 2203
Update AMD AM4 AGESA PI 1.0.0.1

Have you clear the CMOS right after the update ? If not, you have garbage values in the nvram

olejon commented 4 years ago

The BIOS always resets to default values after an update and I go through all the settings again and set them as before (and check if new ones), instead of loading a saved profile.

Hm. Maybe a total power off then (PSU too). If not, does a hard CMOS reset ("short circuit" pins) do more than resetting the BIOS in UEFI, really? Fixing NVRAM shit is typically a Mac thing. Think many knows Option + Command + P + R lol (and the SMC reset one).

Had something similar, don't remember when, if was switching to Ryzen or the AMD GPU, PCIe3 to 4 NVMes... That was fixed quickly by an Ubuntu update.

cyring commented 4 years ago

The BIOS always resets to default values after an update and I go through all the settings again and set them as before (and check if new ones), instead of loading a saved profile.

Hm. Maybe a total power off then (PSU too). If not, does a hard CMOS reset ("short circuit" pins) do more than resetting the BIOS in UEFI, really? Fixing NVRAM shit is typically a Mac thing. Think many knows Option + Command + P + R lol (and the SMC reset one).

Had something similar, don't remember when, if was switching to Ryzen or the AMD GPU, PCIe3 to 4 NVMes... That was fixed quickly by an Ubuntu update.

It still written today in the ROG manual to clear the BIOS RAM after an update. To my knowledge, nothing is automated. A jumper should be on board ?

olejon commented 4 years ago
juni 24 04:02:08 tux kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP8._DSM], AE_ALREADY_EXISTS (20190703/dswload2-327)
juni 24 04:02:08 tux kernel: ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
cyring commented 4 years ago

Some ACPI Device Specific Method _DSM linked to GPP8 in System Bus _SB

Search for GPP8 in whole /sys/devices/LNXSYSTM Mine is located at :

cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0f/path
\_SB_.PCI0.GPP8
olejon commented 4 years ago

Nothing...

/sys/devices/LNXSYSTM:00# find ./ -name "*GPP8*"

Suppose there's no point in reinstalling the linux-firmware amd-microcode packages. Will check with Ubuntu 20.04 Live USB if also there. Just in case.

Think my MB has BIOS Flashback to revert, but I've never used such a feature. Next time I'll google properly before updating. The April BIOS was good.

Did you understand the changelog? Have you installed it successfully?

cyring commented 4 years ago

Nothing...

/sys/devices/LNXSYSTM:00# find ./ -name "*GPP8*"

Suppose there's no point in reinstalling the linux-firmware amd-microcode packages. Will check with Ubuntu 20.04 Live USB if also there. Just in case.

Think my MB has BIOS Flashback to revert, but I've never used such a feature. Next time I'll google properly before updating. The April BIOS was good.

Did you understand the changelog? Have you installed it successfully?

I won't install any BIOS. All features are working fine

Since Zen has been introduced, ACPI tables have been an issue. See the kernel history....

ROG CROSSHAIR VIII HERO had been released with:

BIOS 1201 11/18/2019

but it has however a remaining log trace ...

ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000007) is beyond end of object (length 0x6) (20200326/exoparg2-393)
ccp 0000:0c:00.1: ccp: unable to access the device: you might be running a broken BIOS.

... which is fine for CoreFreq because I'm working on my own [P][C]-States sub-drivers for Zen

My EFI entry tries to blacklist any of them:

title   Arch Linux
linux   /EFI/Linux/vmlinuz-linux
initrd  /EFI/Linux/amd-ucode.img
initrd  /EFI/Linux/initramfs-linux.img
options root=/dev/disk/by-label/root rw quiet break=n add_efi_memmap nmi_watchdog=0 selinux=0 loglevel=3 rd.systemd.show_status=auto rd.udev.log-priority=3 consoleblank=0 vt.color=0x03 modprobe.blacklist=pcspkr,nouveau,k10temp,acpi_cpufreq idle=halt cpu0_hotplug audit=0

which results in CoreFreq:

Linux:                                                                          
|- Release                                                       [5.7.4-arch1-1]
|- Version                      [#1 SMP PREEMPT Thu, 18 Jun 2020 16:01:07 +0000]
|- Machine                                                              [x86_64]
Memory:                                                                         
|- Total RAM                                                         32856504 KB
|- Shared RAM                                                          117756 KB
|- Free RAM                                                          30792460 KB
|- Buffer RAM                                                           79832 KB
|- Total High                                                               0 KB
|- Free High                                                                0 KB
CPU-Freq driver                                               [         Missing]
Governor                                                      [         Missing]
CPU-Idle driver                                               [         Missing]

Missing being the sign that nothing is in control, letting me choose Target P-States For example, decrease from default 35 to 22 but put back CPU #12 to 35 2020-06-24-103525_644x1012_scrot

But target ratios are still moving on their own, resulting in CPU #12 = 36 and its SMT CPU #28 jumping between 43, 36 other CPUs are also not fully in control.

I believe its part of the collaborative stuff between Processor and some software, feeding back to it the requested Target. So far studying those entry functions: acpi_cpufreq_fast_switch , acpi_idle_enter and cppc_cpufreq_set_target

cyring commented 4 years ago

I'm noticing this strange behavior

olejon commented 4 years ago

Luckily it was easy to revert back to BIOS 1407 (April). It did enhance performance somewhat (not reverting but upgrading to that one from the previous).

  1. Load optimized defaults (F5) in UEFI and save
  2. Boot Windows (could've used Linux)
  3. Format USB drive to FAT32
  4. Download that BIOS file and put it on the USB
  5. Opening the EZ Flash tool in UEFI
  6. It automatically switched to the disk with the .CAP file
  7. Flash it, and it says Update successful, System will Reset!
  8. Back to normal
  9. Manually go through UEFI settings again (right now all stock)

Changelogs for those:

1404 (2019):

  1. Update AM4 combo PI 1.0.0.4 patch B
  2. Support Ryzen™ 2000-series APU
  3. You will not be able to downgrade your BIOS after updating to this BIOS version

1405 (2019):

  1. Improve system performance

1407 (April 2020 one I run now):

  1. Improve system stability
  2. Add BCLK Frequency and SB Clock Spread Spectrum items into BIOS Advanced mode.

cyring commented 4 years ago

Glad you successfully fallback to previous firmware and thank you for this detailed procedure. As I told you I'm not in a hurry to upgrade BIOS. There are still so many bits I have to discover and to program into CoreFreq

Using the latest version, are you also noticing than for a standard target performance ratio of 35, processors answers with a value of 36 ?

To monitor this, press key ! to toggle to absolute frequency

Are you also observing that 36 is the magic value which triggers Turbo Boost ?

olejon commented 4 years ago

linux-image-5.3.0-61-generic linux-modules-extra-5.3.0-61-generic linux-headers-5.3.0-61-generic linux-headers-5.3.0-61 linux-modules-5.3.0-61-generic

watch -n 2 cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq

Image

cyring commented 4 years ago

Without CPUFreq drivers (confirmed by Missing) I can program things these ways:

  1. You program ratio to 22 then you stress CPU and its frequency ratio is capped to22 (as expected)
  2. The same happens, if you program the P-State to 28
  3. But as soon as you program 35, idle shows 36 and full load goes up to 47

To summarize: I don't get the 47 ratio boost with P-States 22 and 28 but only with the P0 P-State of factory value 35

cyring commented 4 years ago

This is what we get from the Rocket SSD ROCKET NVMe 4 0 WITH HEATSINK

olejon commented 4 years ago

From Tom's Hardware:

ATTO is a simple and free application that SSD vendors commonly use to assign sequential performance specifications to their products. It also gives us insight into how the device handles different file sizes

Straight from Corsair:

ATTO Disk Benchmark has long been an industry staple for measuring drive performance

Windows - Corsair MP600 - Basically the first proper PCIe4

Image

cyring commented 4 years ago

I fear that WSL won't offer more than the TSC WiP #191

olejon commented 4 years ago

Oh lol, I was about to try CoreFreq on the Windows Linux subsystem one day, but assumed it wouldn't work, but worth a shot since it's easy enough to try. That if I could actually install the build-essential metapackage. Otherwise I would assume not. Also noticed screen only works as root/sudo. Don't like to run e.g. apt (or any package manager) to install/update many packages without screen or similar, on any system.


Kind of odd Microsoft has added that option. First they kind of had their subsystem where you could run BASH directly from the Start Menu (through search only I think, was a hidden link) or Shift + Right Click in a folder in Windows Explorer. But they removed that without saying really. BASH just exited.

But was no good anyways when actually worked. Lacked so much. Ubuntu from the Microsoft Store runs the latest LTS, or you can choose a compatible LTS. Then it works, and has more or less what macOS has by default. Easier to install more, though. Not Homebrew and Apple making that harder or screwing up installed packages for every version.

It's weird to have cmd.exe which is horrible but can do much more than one might think. Many driver packages, some software and enterprise AD use it for setup still... Has tons of AD/networking commands but very undocumented. Unless you've taken a MS certification course or something, it's hard to work with.

Powershell I played with a little. Long story short, even our Powershell expert couldn't make it do what a BASH script I wrote in 30 seconds did (read webcam barcode value and open browser to X URL with Y parameters).

So now Window's got 3! They gotta keep cmd for drivers and certain software, but hide it and just use BASH... IMHO.

Pretty nicely integrated with your / Linux folder right there in Windows Explorer and everything. Little like Chromebooks but they also run GUI software just fine like GIMP etc.

Like... Have you tried just to get the MD5 value of a file on Windows without a 3rd party program? Found some official Microsoft package to add an insanely long command to cmd to do checksums but turned out to be deprecated, without saying any option... Now that's easy. And much more basics that a freakin' OS should be able to. Lots of Windows packages does "Verifying download/install" but IDK what mechanism. Apple has md5 (not called md5sum) and shasum like Linux, but doesn't use BASH by default anymore.

cyring commented 4 years ago

Is Hyper-V the Hypervisor of WSL-2 ? Booting the CoreFreq ISO into a Hyper-V VM behaves the same as within WSL. Thus is WSL a particular VM of Hyper-V, but without showing up into the VM manager.

Many MSR can be queried. Although they return zero, they don't crash. They are thus not pathrough, but trapped.

But APERF/MPERF counters, which are similar to MSR, crash. I feel they are trapped by Hyper-V, but something is missing. Probably, some settings are missing in my WSL configuration, such as Xen, KVM and VBOX let you define rules when a MSR is called.

Why Hyper-V makes possible a virtualized TSC, and not APERF /MPERF ? Especially when: virtualizing the TSC, the hardest part, has been achieved.

olejon commented 4 years ago

Well one search and...

Does WSL 2 use Hyper-V? Will it be available on Windows 10 Home? WSL 2 will be available on all SKUs where WSL is currently available, including Windows 10 Home. The newest version of WSL uses Hyper-V architecture to enable its virtualization. This architecture will be available in the 'Virtual Machine Platform' optional component. This optional component will be available on all SKUs. You can expect to see more details about this experience soon as we get closer to the WSL 2 release.

FAQ: https://docs.microsoft.com/en-us/windows/wsl/wsl2-faq

cyring commented 4 years ago

Definitively it's the same Hypervisor: CoreFreq is running in an ArchLinux VM and we're reading the same Vendor ID CoreFreq_HYPER-V

cyring commented 4 years ago

It's been so hot in Europe these last days, I tuned my CPUs to cool down: No OC, especially PBO

olejon commented 4 years ago

Introduction

Yeah, also very hot In Norway, for weeks :sun_with_face:

NOTE: I divide this into 2 Categories: AMD OC found under AMD Overclocking and Manual OC found under AI Tweaker, or for you Extreme Tweaker. From what I see they have the same Settings. One (fancy manual) setting was added in a later BIOS version than yours. I've never touched it.

  1. Windows Screenshots. Every screenshot has bullet points above = Which UEFI Settings were changed
  2. Linux Comparison Screenshots
  3. UEFI Screenshots of the Settings
  4. CONCLUSION: Temperatures, Watts and Other Observations

Windows - Run 1

Image

Windows - Run 2

Image

Windows - Run 3

Image

Windows - Run 4

Windows - Run 5

Image


Linux Clone of same Settings as in Run 3

Image

Linux Clone of same Settings as in Run 5

Image


UEFI AMD OC Built-In PBO Settings

Image

Image

UEFI AMD OC Built-In RAM Settings

Image

Image

UEFI Manual OC Settings

Image


CONCLUSION




cyring commented 4 years ago

Awesome.

PBO is indeed helping a lot in performance score. But did you mention which policy plan is in used: Windows default or AMD policy ? The last is keeping the frequency target to a high value, at the expense of temperature; whereas Windows Balanced mode is keeping frequency among the lowest P-State ratios.

Would be cool if CoreFreq could read PPT, TDC and EDC MAX values as Ryzen Master shows them. Well at least PPT since CoreFreq shows Watts. Possible?

One software on GitHub is getting those values by reversed engineering. I've not try it yet. And specs just give the basic information. Not enough to program a safe algorithm.

olejon commented 4 years ago

Impossible to replicate Run 3 on Windows

But first the Voltages:

Frequencies after having run 1 minute:

ACTUAL RESULTS:

POSSIBLE REASONS:

TO YOUR QUESTION:

cyring commented 4 years ago

It depends on Windows' mood or time of day?

There are so many Windows service processes of any kind running that it's hard to measure and reproduce benchmark. Even RM has a 100 to 300 MHz CPU overhead for its own monitoring. Either Windows is stripped of any tasks to make it bare metal, or Linux Kernel is built with most drivers and security removed, such as page randomization. Then you will set processor registers to your will and barely get a reproductible and understandable score.

olejon commented 4 years ago

My Windows 10 is as Clean as it gets. All permanent or scheduled unnecessary services Disabled.

After boot:

I did try the Ryzen Performance Power Plan, and observed Ryzen Master for any changes:

BTW: Not sure if you're ever going to find that the Cores actually Sleep. I've been watching and never seen a Core below 130 MHz - once it's below 130 MHz it says "Sleeping". So maybe "Sleeping" is just indicating very low MHz...

cyring commented 4 years ago

BTW: Not sure if you're ever going to find that the Cores actually Sleep. I've been watching and never seen a Core below 130 MHz - once it's below 130 MHz it says "Sleeping". So maybe "Sleeping" is just indicating very low MHz...

But AMD says Processor has C-States. Those are mentioned in ASUS BIOS. The Software Manual specifies the X86 MONITOR/MWAIT instructions to enter idle states. This has to exist but we don't have so far the counters addresses to monitor them.

I'm reading you that RM may implement a CPU load minimum threshold to say: "it's sleeping" But why such dummy algorithm ? RM is an AMD software: authors have full access to the NDA specifications to make an accurate software !