andreas-abel / nanoBench

A tool for running small microbenchmarks on recent Intel and AMD x86 CPUs.
http://www.uops.info
GNU Affero General Public License v3.0
435 stars 55 forks source link

fix kernel mode crashing in Ubuntu 22.04 #28

Closed bjzhjing closed 1 year ago

bjzhjing commented 1 year ago

While build nanoBench kernel module in Ubuntu 22.04, gcc is with -mfunction-return=thunk-extern as default option. According to chapter 6.1.1 JMP2RET in the following reference: https://www.amd.com/system/files/documents/\ technical-guidance-for-mitigating-branch-type-confusion.pdf all 'ret' instructions are consolidated into a single piece of code. Instead of functions ending with a 'ret' instruction, they instead end with "jump __x86_return_thunk".

Since a 'jmp' instruction is provided instead of 'ret' at the end of each function, it cause functions like create_runtime_code() copy much more assembler code into runtime_code memory than it should during runtime. Memory protection fault happens finally while running.

To address the above issue, option -mfunction-return=keep is provided for kernel mode to overwrite the gcc default behavior in Ubuntu 22.04. This can ensure function has 'ret' instruction generated.

Signed-off-by: Cathy Zhang cathy.zhang@intel.com

bjzhjing commented 1 year ago

Greetings @andreas-abel, could you please help review this?

andreas-abel commented 1 year ago

Hi @bjzhjing, which version of gcc are you using? With gcc 11.3.0 (the default on Ubuntu 22.04), the problem does not seem to occur.

bjzhjing commented 1 year ago

@andreas-abel While with my experiments, it seems to be related to the kernel .config file. If 'CONFIG_CC_HAS_RETURN_THUNK=y' is set in .config, gcc will have -mfunction-return=thunk-extern as default while compiling the kernel module, check the file nanoBench/common/.nanoBench.o.cmd, it should include -mfunction-return=thunk-extern after make kernel. It's not decided by gcc itself.

bjzhjing commented 1 year ago

In Ubuntu 22.04 with 'CONFIG_CC_HAS_RETURN_THUNK=y' , run commands like sudo ./kernel-nanoBench.sh -f, then issue dmesg, you will see the following message:

[157342.424547] BUG: unable to handle page fault for address: ff464367d50d6374 [157342.424604] #PF: supervisor instruction fetch in kernel mode [157342.424649] #PF: error_code(0x0011) - permissions violation

After that, if do sudo rmmod nb, console will hang.