Decompiler doesn't split loads that read from multiple struct fields

comex commented 1 year ago

Version and Platform (required):

Binary Ninja Version: 3.2.3910-dev (Build ID 5c2b2b33)
OS: macOS
OS Version: 13.1
CPU Architecture M1

Bug Description: Related to #2805 from last year. The example binary from there still decompiles correctly, but the following similar example does not.

This AArch64 code (example binary here) performs a 64-bit load, then adds the top 32 bits to the low 32 bits:

ldr x1, [x0]
mov w0, w1
add x0, x0, x1, lsr #32
ret

Binary Ninja's initial decompilation is fine:

00000000      int64_t x1 = *arg1
0000000c      return zx.q(x1.d) + (x1 u>> 0x20)

However, I happen to know that arg1 actually points to two consecutive 32-bit integers; the 64-bit load is produced by the compiler merging two 32-bit loads. (The example is oversimplified, but the basic instruction sequence comes from a binary I built myself using Clang.) So I add a type:

struct two_u32s { uint32_t a, b; };

...and set arg1's type to struct two_u32s *. I expect to get something like

return arg1->a + arg1->b

Instead I just get

00000000      int64_t x1
00000000      x1.d = arg1->a
00000000      x1:4.d = arg1->b
0000000c      return zx.q(x1.d) + (x1 u>> 0x20)

rssor commented 1 year ago

So the load/store splitting side of things is actually working as intended, where it's going wrong is that right now we can't support multiple disjoint variables living in the same register. So the load is being split appropriately, but on the left hand side we can't do better than treating them each as partial writes into the same var.

There are a few ways we might be able to tackle patterns like this, probably just in terms of inlining so we don't have to try to tackle that fundamental variable issue.

One partial approach might also be to more completely support non-primitive types in registers, which is a space where we have room for improvement in general. There are a range of general correctness problems to tackle on this front as well.

The deeper improvements necessary here in the long term are mostly around byte ordering: because Variables can be held in registers or stack locations, how to determine which bytes are being referred to in partial accesses can vary depending on which type of var is accessed (registers being conceptually offset 0 being LSB, stack vars it's offset from start address, so the target endianness matters). This is, as of now, not robustly/consistently handled.

For arm64 though, I think it might just happen to work if you do give a struct type to x1.

Even longer term, we probably also need to look into one var being split across several registers as part of some kind of copy.

comex commented 1 year ago

Thanks. FWIW, it doesn't quite work if I give a struct type to x1:

00000000      struct two_u32s x1
00000000      x1.a = arg1->a
00000000      x1.b = arg1->b
0000000c      return zx.q(x1.a) + (x1 u>> 0x20)

The load itself is correct albeit suboptimal (assigning each field rather than the struct as a whole), but I still get x1 u>> 0x20 instead of x1.b.

rssor commented 1 year ago

Assigning to each field rather than the struct as a whole is intentional, the long-term intended behavior is for MLIL ops to be as narrow as possible (e.g. consider scripts that want to find places that touch particular fields, by splitting out each separate field as its own op we ensure xrefs get created for each field touched, even if the underlying store/load is hitting several at once).

x1 u>> 0x20 getting transformed into a field access seems doable though.

Vector35 / binaryninja-api

Decompiler doesn't split loads that read from multiple struct fields #3731