crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.39k stars 1.62k forks source link

Remove unnecessary explicit memory barriers on ARM #14567

Closed ysbaddaden closed 5 months ago

ysbaddaden commented 5 months ago

Reading the LLVM atomics documentation, the notes for code generation state that weak architectures should generate memory barriers, which implies that all memory orders, except for relaxed, should always generate the necessary memory barriers when asked to.

Compiling and disassembling programs for ARMv6 and ARMv7 (you must specify a CPU or CPU feature to target these, otherwise LLVM defaults to ARMv4 or ARMv5), we can indeed notice that memory barriers are properly emitted by the LLVM code generation. For example the assembly contains dmb ish for ARMv7 and mcr 15, 0, r1, cr7, cr10, {5} for ARMv6.

Compiling for older ARMv4 or ARMv5 the atomics call the __sync_* symbols from libgcc where memory ordering is a noop (not supported by the micro architecture).

This patch thus removes the explicit memory barriers. They duplicate what LLVM backends are already doing (optimized away in some cases, but there may be edge cases where it's not), and this noticeably simplifies our code.

ysbaddaden commented 5 months ago

This time it should really be the last time we deal with this topic :sweat_smile:

The initial error was that we used a lazy set to clear atomics when we should have used explicit memory orders. Only on x86, in very specific scenarios, can we get away with an lazy set.