Zero-Tang / NoirVisor

The Grimoire Hypervisor solution for x86 Processors with experimental nested virtualization support.
MIT License
462 stars 80 forks source link

Splitting a 2MiB page into 512 4KiB pages degrades system performance on hybrid CPUs (Intel 12th Gen and higher) #35

Closed papstuc closed 1 month ago

papstuc commented 2 months ago

As soon as calling nvc_ept_split_pde on an often called GPA (in this case NtProtectVirtualMemory in ntoskrnl) the system becomes noticeably slow on newer Intel CPUs (12th gen and higher), however testing it on my old 8700K it worked completely fine, so I tried to disable all E-Cores which fixed the issue. The only problem is that the E-cores are now unusable.

Checking the CPU's specification I found that it uses the Intel Smart Cache Technology, which basically states that 4 E-Cores share the same L2 cache which I think leads to a lot of cache misses when the page granularity is 4KiB.

The Intel® Smart Cache Technology is a shared Last Level Cache (LLC).

  1. The LLC is non-inclusive.
  2. The LLC may also be referred to as a 3rd level cache.
  3. The LLC is shared between all IA cores as well as the Processor Graphics.
  4. For P Cores The 1st and 2nd level caches are not shared between physical cores and each physical core has a separate set of caches.
  5. For E Cores The 1st level cache is not shared between physical cores and each physical core has a separate set of caches.
  6. For E Cores The 2nd level cache is shared between 4 physical cores.
  7. The size of the LLC is SKU specific with a maximum of 3MB per P physical core or 4 E cores and is a 12-way associative cache.

I tested it on an Intel Core i7 14700K both on Windows 10 and Windows 11 but got the same problem.

Any help or push in the right direction would be greatly appreciated. Your project is amazing and I have loads of fun tinkering around with it!

Zero-Tang commented 1 month ago

I'm not so sure if Intel Smart Cache really matters. Your i7-8700K also uses Intel Smart Cache. image

To confirm, I think you should use perfmon /sys and compare the cache hit rate.

papstuc commented 1 month ago

Yes, I just noticed the 8700K also has smart cache. However each individual core has it's own L2 cache, which is not the case on hybrid CPUs with E-cores (4 E-cores share one L2 cache).

I will continue testing with perfmon, thank you!

papstuc commented 1 month ago

Finally fixed this issue. It works when having one global EPT paging structure that is shared across all cores. I guess having one for each core is just not really cache-friendly when having a shared cache across cores.

Zero-Tang commented 1 month ago

Great feedback!