andreas-abel / nanoBench

A tool for running small microbenchmarks on recent Intel and AMD x86 CPUs.
http://www.uops.info
GNU Affero General Public License v3.0
435 stars 55 forks source link

CacheAnalyzer process killed, kernel module issues #27

Closed abe-f closed 1 year ago

abe-f commented 1 year ago

Hi, I'm trying to use the cache analyzer tool. However, the process is getting killed due to errors in the kernel module, and the PC usually slowly dies and needs a restart. Here is a segment of the dmesg after running 'sudo ./cacheSeq.py -level 2 -sets 10-14,20,35 -seq "A B C D A? C! B?"'

[  122.924677] nb: module verification failed: signature and/or required key missing - tainting kernel
[  122.925359] Initializing nanoBench kernel module...
[  123.037080] Vendor ID: GenuineIntel
[  123.037089] Brand: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
[  123.037092] DisplayFamily_DisplayModel: 06_9EH
[  123.037095] Stepping ID: 9
[  123.037097] Performance monitoring version: 4
[  123.037099] Number of fixed-function performance counters: 3
[  123.037101] Number of general-purpose performance counters: 4
[  123.037102] Bit widths of fixed-function performance counters: 48
[  123.037104] Bit widths of general-purpose performance counters: 48
[  133.965640] No physically contiguous memory area of the requested size found.
[  133.965644] Try rebooting your computer.
[  246.783643] msr_str: 0xe01
[  246.783646] msr_str: 0x700
[  246.783648] msr_str: 0xe01
[  246.783649] msr_str: 0x710
[  246.783650] msr_str: 0xe01
[  246.783651] msr_str: 0x720
[  246.783652] msr_str: 0xe01
[  246.783653] msr_str: 0x730
[  246.941670] BUG: unable to handle page fault for address: ffffafa028927e71
[  246.941674] #PF: supervisor instruction fetch in kernel mode
[  246.941676] #PF: error_code(0x0010) - not-present page
[  246.941677] PGD 100000067 P4D 100000067 PUD 0 
[  246.941680] Oops: 0010 [#1] SMP PTI
[  246.941682] CPU: 4 PID: 2321 Comm: python3 Tainted: G           OE     5.15.0-53-generic #59~20.04.1-Ubuntu
[  246.941685] Hardware name: Dell Inc. OptiPlex 7050/0NW6H5, BIOS 1.8.3 03/23/2018
[  246.941686] RIP: 0010:0xffffafa028927e71
[  246.941689] Code: Unable to access opcode bytes at RIP 0xffffafa028927e47.

I've attempted this with an Intel i7-9750H, i9-12900k, and now an i7-7700. Using the i7-7700, I'm testing on a fresh install of Ubuntu 20, Kernel version 5.15. For the set-R14-size.sh script, it almost always fails (even after reboot) when using 'sudo ./set-R14-size.sh 1G'. However, if I do more memory, it seems that the allocation sometimes succeeds. Before the dmesg above, I tried 1G, then around 1200M. This seems a bit strange, could it be the issue? Or is there anything else that I'm obviously missing? Here is an example comamnd sequence that I'm using after boot:

cd nanoBench
make kernel
sudo insmod kernel/nb.ko
sudo ./set-R14-size.sh 1200M
cd tools/CacheAnalyzer
sudo ./cacheSeq.py -level 2 -sets 10-14,20,35 -seq "A B C D A? C! B?"
andreas-abel commented 1 year ago

I am aware of a problem with 5.15 kernels >= 5.15.0-46; I don't know yet what the reason for this problem is. If you are using such a kernel, I would suggest either using a 5.17 kernel, or a 5.15 kernel that is older than 5.15.0-46.

abe-f commented 1 year ago

Ah, running on an older kernel now and it's working. Thank you! Excited to use the tool.