cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
2k stars 127 forks source link

Hardware Prefetch for Atom E-Cores #470

Closed cyring closed 10 months ago

cyring commented 11 months ago

In this Intel whitepaper 357930-001US we are reading new MSR dedicated to Atom sub-architecture within hybrid processors.

Feature Register
LLC Streamer 0x1320
L2 NLP 0x1321
SELECTION 0x1323

Whereas the known MSR 0x1A4 is said to be available for both P-Cores and E-Cores


I'll appreciate if one can rdmsr the new MSR on E-Core CPU numbers.

Next, still on E-Cores, toggle the Enable bit using a wrmsr ; rdmsr modification sequence.

It would be also interesting to check the specs:


Thanks for helping.

cyring commented 11 months ago
contributors @gundami @Technologicat @justanerd @BugReporterZ @svmlegacy @kocoman1 @jowa2021 @Betaminos @huajian628 @vitaly-zdanevich @rushvora

Can your guys please help to investigate those registers on Hybrid architecture ?

BugReporterZ commented 11 months ago

I tried this.

#rdmsr -a 0x1321
CPU 19: 250122000001
CPU 18: 250122000001
CPU 17: 250122000001
CPU 16: 250122000001
rdmsr: CPU 15 cannot read MSR 0x00001321

#rdmsr -a 0x1323
CPU 19: 1f9cc00000000
CPU 18: 1f9cc00000000
CPU 17: 1f9cc00000000
CPU 16: 1f9cc00000000
rdmsr: CPU 15 cannot read MSR 0x00001323

===============================================

#rdmsr -a 0x1320
CPU 19: 100007e041000004
CPU 18: 100007e041000004
CPU 17: 100007e041000004
CPU 16: 100007e041000004
rdmsr: CPU 15 cannot read MSR 0x00001320

#wrmsr -p 19 0x1320 4
#rdmsr -a -x -0 0x1320
CPU 19: 0000000000000004
CPU 18: 0000000000000004
CPU 17: 0000000000000004
CPU 16: 0000000000000004
rdmsr: CPU 15 cannot read MSR 0x00001320

#wrmsr -p 19 0x1320 1152930164351434756
#rdmsr -a -x -0 0x1320
CPU 19: 100007e041000004
CPU 18: 100007e041000004
CPU 17: 100007e041000004
CPU 16: 100007e041000004
rdmsr: CPU 15 cannot read MSR 0x00001320
cyring commented 11 months ago

I tried this.

Excellent. I can code now. Thank you

cyring commented 11 months ago
Hardware Prefetcher
Allows you to enable or disable the MLC streamer prefetcher.
Configuration options: [Disabled] [Enabled]

Adjacent Cache Line Prefetch
Allows you to prefetch adjacent cache lines, reducing the DRAM loading time and improving 
the system performance.
Configuration options: [Disabled] [Enabled]

Adjacent Cache Line Prefetch Select Enabled for the CPU to prefetch both cache lines for 128 bytes as comprised. Select Disabled for the CPU to prefetch both cache lines for 64 bytes. The options are Disabled and Enabled.



* ASRock Intel Z790 Motherboard
`<same as above>`
cyring commented 11 months ago

Hardware Prefetcher Select whether to enable the speculative prefetch unit of the processor. Options available: Enable, Disable. Default setting is Enable. L2 RF0 Prefetch Disable Options available: Enable, Disable. Default setting is Disable.

Adjacent Cache Prefetch When enabled, cache lines are fetched in pairs. When disabled, only the required cache line is fetched. Options available: Enable, Disable. Default setting is Enable.

DCU Streamer Prefetcher Enable/Disable DCU streamer prefetcher. Options available: Enable, Disable. Default setting is Enable.

DCU IP Prefetcher Enable/Disable DCU IP Prefetcher. Options available: Enable, Disable. Default setting is Enable.



* [BIOS for Dell PowerEdge Servers](https://infohub.delltechnologies.com/p/bios-settings-for-optimized-performance-on-next-generation-dell-poweredge-servers/)
* [Lenovo Processors UEFI_XEON_1ST_2ND](https://pubs.lenovo.com/uefi_xeon_1st_2nd/processors.html)
* [`HPC Cluster Tuning on
 3rd Generation Intel® Xeon`](https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://cdrdv2-public.intel.com/686419/HPC-Cluster-Tuning-Guide-on-3rd-Generation-Intel-Xeon-Scalable-Processors.pdf&ved=2ahUKEwi414W1upmDAxV4TaQEHWGVDqUQFnoECB8QAQ&usg=AOvVaw0rTxexBbnkrILDZM1PDzjb)
* [firmware-settings](https://github.com/microsoft/MSLab/blob/master/Docs/AzureStackHCI/02-AzSHCI-Deployment/01-hardware-configuration.md#firmware-settings)
* [Supermicro BIOS settings](https://github.com/cncf/cnf-testbed/issues/129#issuecomment-438191639)
cyring commented 11 months ago

The Intel Atom cores are placed in a group of four per module with private L1 caches for each core.

Each core has a set of L1 hardware prefetchers

The L2 prefetch block is shared for all cores in the module

trackers are shared between the cores

DCU – Data Caching Unit is the block that holds the L1 data cache.

cyring commented 11 months ago

+ @BugReporterZ : Hello,

2023-12-16-133435_720x425_scrot

Above is the last commit 0a0daea556fef89c6ff0249101cc51f794ab6e4c which adds DCU L1 NLP, a bit of MSR 0x1A4 unveiled in the Whitepaper.


Since commit cc5c3270c0ef699047b7e6d40dfe025fd71492c1 are added for E-Cores:

2023-12-17-092835_720x425_scrot

Remark: Screenshots above are made from virtualization

cyring commented 11 months ago

In "Hardware LLC prefetch feature on 4th Gen Intel® Xeon® Scalable Processor (Codename Sapphire Rapids)" we are reading L3 prefetch disable at bit 42 of MSR 0x6D

Does it work on Desktop or Mobile processors ?

cyring commented 10 months ago

Pre-release in progress https://github.com/cyring/CoreFreq/discussions/472 If things are missing, please let me know.