dicksites / KUtrace

Low-overhead tracing of all Linux kernel-user transitions, for serious performance analysis. Includes kernel patches, loadable module, and post-processing software. Output is HTML/SVG per-CPU-core timeline that you can pan/zoom down to the nanosecond.
Other
633 stars 60 forks source link

Misidentifies AMD as Intel Architecture and Fails to Handle LLC Properly on AMD #12

Open ChenZijiSubset opened 1 month ago

ChenZijiSubset commented 1 month ago

Hi,

I recently encountered a segmentation fault while running KUtrace on a workstation equipped with an AMD Ryzen Threadripper PRO 5955WX 16-Core processor. Upon inspecting the dmesg log, I discovered that KUtrace incorrectly identified the system as using Intel architecture, which led to this issue.

After further investigation, I modified the KUtrace code (located in linux/module/kutrace_mod.c) to accurately distinguish between AMD and Intel architectures. I also improved the logic for bypassing the Last Level Cache (LLC) specific to AMD processors. These modifications enabled me to successfully gather monitoring data.

Here are the main changes I implemented:

  1. Updated the conditions to differentiate AMD and Intel architectures:
#define IsAmd_64        (Isx86_64 && (defined(__znver1) || defined(__znver2) || defined(__znver3)))
#define IsIntel_64      (Isx86_64 && !IsAmd_64)
  1. Enhanced the logic for handling LLC on AMD processors:
void ku_setup_llc_miss(void)
{
#if IsIntel_64
        u64 llc_miss_sel;
        u64 llc_miss_enable;
        /* Count LLC_MISS, both user and os; enable counting */
        /* llc_miss_sel = rdMSR(IA32_PERFEVTSEL1); */
        llc_miss_sel = (PMC_USR_EN | PMC_OS_EN | PMC_EN) |
          (C_LLC_MISS | U_LLC_MISS | E_LLC_MISS);
        wrMSR(IA32_PERFEVTSEL1, llc_miss_sel);

        /* Enable fixed llc_miss counter in IA32_PERF_GLOBAL_CTRL */
        llc_miss_enable = rdMSR(IA32_PERF_GLOBAL_CTRL);
        llc_miss_enable |= PMC1_EN;
        wrMSR(IA32_PERF_GLOBAL_CTRL, llc_miss_enable);
/*Chen 2024.9.13 for AMD*/
#elif IsAmd_64
        printk(KERN_INFO "LLC miss counting not implemented for AMD processors.\n");
#else
        /* Not implemented for AMD, RPi */
        #error Define ku_setup_llc_miss for your architecture

#endif
}

... ... ... ...

inline u64 ku_get_llc_miss(void)
{
#if IsIntel_64
        u32 a = 0, d = 0;
        int ecx = IA32_PMC1;            /* What counter it selects, Intel */
        __asm __volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(ecx));
        return ((u64)a) | (((u64)d) << 32);
/*Chen 2024.09.13*/
#elif IsAmd_64
        printk(KERN_INFO "LLC miss counting not supported on AMD processors.\n");
        return 0;
#else
        /* Not implemented for AMD, RPi */
        #error Define llc_miss for your architecture
        return 0;
#endif
}

Maybe these changes could benefit others using similar systems.

Environment:

Processor: AMD Ryzen Threadripper PRO 5955WX 16-Cores Operating System: linux-6.6.36