cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.05k stars 603 forks source link

patch to support EPYC 9654P cpu #1319

Open Meandres opened 5 days ago

Meandres commented 5 days ago

We are running OSv on a server equipped with an EPYC 9654P CPU. In the process, we encountered several minor issues that we tried to fix with this patch. To the best of our knowledge, these changes do not introduce breaking changes on other systems. I don't exactly know if they are worthy of being merged, but at least it is documented (should I open an issue instead ?)

The first of them is the setting of mmu::max_phys_bits to 51 (here: https://github.com/cloudius-systems/osv/blob/master/arch/x64/arch-mmu.hh#L14). The 9654P supports physical addresses with 52 bits, and I guess cpuid reports it. Removing the assert and simply setting mmu::phys_bits to mmu::max_phys_bits if it exceeds works both on the 9654P as well as older EPYC and Xeons.

The second is a bit more hidden. When running a very memory-heavy benchmark, we noticed that the CPU was stalling in the front end (instead of the expected stalls in the back end). We narrowed down the issue to a strange bug where the micro-op cache seems to be disabled if not setting the MP bit of the CR0 (https://wiki.osdev.org/CPU_Registers_x86#CR0) during boot. So far, we have not been able to understand why this happens, but it does. We used the following program as a minimal example:

#include <cstdint>

int main(){
    for(uint64_t i=0; i<1e12; i++){
        asm volatile("addq $42, %%rax": : : "memory");
    }
    return 0;
}

We gathered the PMC using the following command: sudo perf kvm stat -M op_cache_fetch_miss_ratio -p $(pidof qemu-system-x86_64) -- sleep 1

Running OSv with a single vcpu pinned to a pcpu (SMT disabled), the output of the above command without the MP bit set gives:

Performance counter stats for process id '528565':

     3,381,772,998      op_cache_hit_miss.op_cache_miss  #    100.0 %  op_cache_fetch_miss_ratio
     3,381,772,998      op_cache_hit_miss.all_op_cache_accesses

       1.001941263 seconds time elapsed

and with the bit set:

Performance counter stats for process id '527893':

         3,646,646      op_cache_hit_miss.op_cache_miss  #      0.1 %  op_cache_fetch_miss_ratio
     3,339,928,235      op_cache_hit_miss.all_op_cache_accesses

       1.001967469 seconds time elapsed