Closed ysbaddaden closed 5 months ago
This time it should really be the last time we deal with this topic :sweat_smile:
The initial error was that we used a lazy set to clear atomics when we should have used explicit memory orders. Only on x86, in very specific scenarios, can we get away with an lazy set.
Reading the LLVM atomics documentation, the notes for code generation state that weak architectures should generate memory barriers, which implies that all memory orders, except for relaxed, should always generate the necessary memory barriers when asked to.
Compiling and disassembling programs for ARMv6 and ARMv7 (you must specify a CPU or CPU feature to target these, otherwise LLVM defaults to ARMv4 or ARMv5), we can indeed notice that memory barriers are properly emitted by the LLVM code generation. For example the assembly contains
dmb ish
for ARMv7 andmcr 15, 0, r1, cr7, cr10, {5}
for ARMv6.Compiling for older ARMv4 or ARMv5 the atomics call the
__sync_*
symbols fromlibgcc
where memory ordering is a noop (not supported by the micro architecture).This patch thus removes the explicit memory barriers. They duplicate what LLVM backends are already doing (optimized away in some cases, but there may be edge cases where it's not), and this noticeably simplifies our code.