Can we optimize non-locking RMW atomic operations? - Githubissues

FEX-Emu / FEX

A fast usermode x86 and x86-64 emulator for Arm64 Linux

https://fex-emu.com

MIT License

2.31k stars 123 forks source link

Can we optimize non-locking RMW atomic operations? #1729

Open Sonicadvance1 opened 2 years ago

Sonicadvance1 commented 2 years ago

Currently we convert all lock RMW ops to acquire-release semantics.

Couple weird things to investigate here

Basic ALU ops without lock
- Non-lock ops get turned in to load + ALU + store
- Can potentially convert in to atomic memory operation without acquire-release semantics.
- Should only generate on ARMv8.1+ if it supports atomic memory ops
- Might need hardware TSO support?
  1. RMW ops that don't imply LOCK but really should, used without LOCK
- CMPXCHG, CMPXCHG8B, CMPXCHG16B, XADD
- These instructions don't imply LOCK prefixes but they are almost universally used with them
- Linux kernel has some optimization where it backpatches lock cmpxchg in to nop cmpxchg on uniprocessors? Citation needed.
- These might be able to be converted to operations with...release? semantics?
- Needs investigation.

dnadlinger commented 2 years ago

Citation needed.

LOCK_PREFIX is defined here

https://github.com/torvalds/linux/blob/8291eaafed36f575f23951f3ce18407f480e9ecf/arch/x86/include/asm/alternative.h#L16-L50

and the patching mechanism is

https://github.com/torvalds/linux/blob/8291eaafed36f575f23951f3ce18407f480e9ecf/arch/x86/kernel/alternative.c#L872-L886