Open liamwhite opened 1 year ago
The issue is not the memory barrier. The decompiler just doesn't know the "version" and the "value" variables are considered volatile. You can set this on a variable from the Ghidra listing view via the right-click popup menu and selecting Data -> Settings ... and then changing the mutability setting to "volatile"
This should give the desired effect of seeing the loop in decompiler output.
The "asm" directive itself I think is preventing the compiler from optimizing the loop away in the source code example.
This is a partial solution, and I can see the loop represented this way, so I will probably do this. But I don't think it should be necessary.
In C you can, of course, mark a variable as volatile to ensure the compiler does not reorder its load, but that only affects the compiler's output. Take this example modified from the original comment, which adds the volatile keywords to the variables:
typedef unsigned int uint32_t;
volatile extern uint32_t version; // now volatile
volatile extern uint32_t value; // now volatile
uint32_t test() {
while (1) {
uint32_t current_version = version;
uint32_t current_value = value;
asm volatile("dmb ishld" ::: "memory");
if (version == current_version) {
return current_value;
}
}
}
This code produces identical assembly to the version without marking those variables as volatile. From compiler explorer, ARM64 GCC 13.2.0:
test:
adrp x1, version
adrp x2, value
add x1, x1, :lo12:version
add x2, x2, :lo12:value
.L2:
ldr w3, [x1]
ldr w0, [x2]
dmb ishld
ldr w4, [x1]
cmp w4, w3
bne .L2
ret
However, a barrier serves an entirely different purpose than volatility. Volatility ensures that the compiler treats all pointer operations as having side effects, and so prevents the compiler from reordering the access. Barriers are used in the ARM64 memory model to disallow reordering by the processor across the barrier.
This is a different concept from volatility, because reordering the loads of version
and value
is fine, as long as both are loaded before the barrier. There would be no need for them to be declared volatile
in a C program.
Without the barrier in the example above, the processor is free to reorder the load of value
after both loads of version
, and merge the loads of version
together, which breaks the intended mechanics of the lock-free data structure.
This seems to be what Ghidra's P-Code optimizer does as well, as it ignores the barrier.
Describe the bug Ghidra's decompiler rearranges loads and stores around memory barriers without respect to their ordering in the input assembly.
To Reproduce Consider the following C program
mbtest.c
:When compiled with
aarch64-linux-gnu-gcc -Os mbtest.c -c -o mbtest.o
, we get the following assembly: (precompiled zipped object file if needed)Which correctly encodes the loop and memory barrier. However, ghidra's decompile produces this:
Which is very incorrect, as it elides the accesses to the counter and value, and rearranges the access to the global value after the memory barrier.
Expected behavior If the decompiler did not rearrange these accesses across the barrier, I would expect to see the output look more like this:
Environment (please complete the following information):