FEX-Emu / FEX

A fast usermode x86 and x86-64 emulator for Arm64 Linux
https://fex-emu.com
MIT License
2.31k stars 123 forks source link

Can we optimize non-locking RMW atomic operations? #1729

Open Sonicadvance1 opened 2 years ago

Sonicadvance1 commented 2 years ago

Currently we convert all lock RMW ops to acquire-release semantics.

Couple weird things to investigate here

  1. Basic ALU ops without lock
    • Non-lock ops get turned in to load + ALU + store
    • Can potentially convert in to atomic memory operation without acquire-release semantics.
    • Should only generate on ARMv8.1+ if it supports atomic memory ops
    • Might need hardware TSO support?
      1. RMW ops that don't imply LOCK but really should, used without LOCK
    • CMPXCHG, CMPXCHG8B, CMPXCHG16B, XADD
    • These instructions don't imply LOCK prefixes but they are almost universally used with them
    • Linux kernel has some optimization where it backpatches lock cmpxchg in to nop cmpxchg on uniprocessors? Citation needed.
    • These might be able to be converted to operations with...release? semantics?
    • Needs investigation.
dnadlinger commented 2 years ago

Citation needed.

LOCK_PREFIX is defined here

https://github.com/torvalds/linux/blob/8291eaafed36f575f23951f3ce18407f480e9ecf/arch/x86/include/asm/alternative.h#L16-L50

and the patching mechanism is

https://github.com/torvalds/linux/blob/8291eaafed36f575f23951f3ce18407f480e9ecf/arch/x86/kernel/alternative.c#L872-L886