Decompiler control flow doesn't consider memory barriers

Describe the bug Ghidra's decompiler rearranges loads and stores around memory barriers without respect to their ordering in the input assembly.

To Reproduce Consider the following C program mbtest.c:

typedef unsigned int uint32_t;

extern uint32_t version;
extern uint32_t value;

uint32_t test() {
     while (1) {
         uint32_t current_version = version;
         uint32_t current_value = value;

         asm volatile("dmb ishld" ::: "memory");

         if (version == current_version) {
             return current_value;
         }
     }
}

When compiled with aarch64-linux-gnu-gcc -Os mbtest.c -c -o mbtest.o, we get the following assembly: (precompiled zipped object file if needed)

    adrp    x1, 0 <version>
    adrp    x2, 0 <value>
    add     x1, x1, #0x0
    add     x2, x2, #0x0
    ldr     w3, [x1]
    ldr     w0, [x2]
    dmb     ishld
    ldr     w4, [x1]
    cmp     w4, w3
    b.ne    10 <test+0x10>
    ret

Which correctly encodes the loop and memory barrier. However, ghidra's decompile produces this:

undefined4 test(void)
{
  DataMemoryBarrier(2,1);
  return value;
}

Which is very incorrect, as it elides the accesses to the counter and value, and rearranges the access to the global value after the memory barrier.

Expected behavior If the decompiler did not rearrange these accesses across the barrier, I would expect to see the output look more like this:

undefined4 test(void)
{
  uint uVar1;
  uint uVar2;

  do {
    uVar1 = version;
    uVar2 = value;
    DataMemoryBarrier(2,1);
    if (uVar1 == version) {
      return uVar2;
    }
  } while (1);
}

Environment (please complete the following information):

OS: Fedora 38, Linux 6.2.13
Java Version: 17.0.6
Ghidra Version: 10.2.2
Ghidra Origin: GitHub release

The issue is not the memory barrier. The decompiler just doesn't know the "version" and the "value" variables are considered volatile. You can set this on a variable from the Ghidra listing view via the right-click popup menu and selecting Data -> Settings ... and then changing the mutability setting to "volatile"

This should give the desired effect of seeing the loop in decompiler output.

The "asm" directive itself I think is preventing the compiler from optimizing the loop away in the source code example.

This is a partial solution, and I can see the loop represented this way, so I will probably do this. But I don't think it should be necessary.

In C you can, of course, mark a variable as volatile to ensure the compiler does not reorder its load, but that only affects the compiler's output. Take this example modified from the original comment, which adds the volatile keywords to the variables:

typedef unsigned int uint32_t;

volatile extern uint32_t version; // now volatile
volatile extern uint32_t value;   // now volatile

uint32_t test() {
     while (1) {
         uint32_t current_version = version;
         uint32_t current_value = value;

         asm volatile("dmb ishld" ::: "memory");

         if (version == current_version) {
             return current_value;
         }
     }
}

This code produces identical assembly to the version without marking those variables as volatile. From compiler explorer, ARM64 GCC 13.2.0:

test:
        adrp    x1, version
        adrp    x2, value
        add     x1, x1, :lo12:version
        add     x2, x2, :lo12:value
.L2:
        ldr     w3, [x1]
        ldr     w0, [x2]
        dmb ishld
        ldr     w4, [x1]
        cmp     w4, w3
        bne     .L2
        ret

However, a barrier serves an entirely different purpose than volatility. Volatility ensures that the compiler treats all pointer operations as having side effects, and so prevents the compiler from reordering the access. Barriers are used in the ARM64 memory model to disallow reordering by the processor across the barrier.

This is a different concept from volatility, because reordering the loads of version and value is fine, as long as both are loaded before the barrier. There would be no need for them to be declared volatile in a C program.

Without the barrier in the example above, the processor is free to reorder the load of value after both loads of version, and merge the loads of version together, which breaks the intended mechanics of the lock-free data structure.

This seems to be what Ghidra's P-Code optimizer does as well, as it ignores the barrier.

NationalSecurityAgency / ghidra

Decompiler control flow doesn't consider memory barriers #5301