Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
946 stars 214 forks source link

Cannot change constants to pointers in the code (using O) #4339

Open joelreymont opened 1 year ago

joelreymont commented 1 year ago

Version and Platform (required):

I would like to change 0x2004a7a8, 0x2004a698, 0x2004a69c to pointers but I can't.

Note also that 0x2004a69c is a pointer at 0x1348a.

00013428  int32_t sub_13428(int32_t arg1, int32_t arg2, int32_t* arg3)

🚫00013434      unimplemented  {vpush {d8, d9}}
0001343c      int32_t r0_2
0001343c      void* r6_1
0001343c      if (arg2 u<= 1 && arg3 != 0)
0001344c          r6_1 = arg2 * 0x208
00013456          r0_2 = (*(sub_6cdd4() + 0x80))(r6_1 + 0x2004a7a8)
00013458          if (r0_2 != 0)
00013468              (*(setup_log() + 0xc))(0, "[ASSERT]%s @ %s @ %d", "get_fmu_dm()->lock_mutex(&barome…", "write_data_to_barometer", 0x386)
00013458      int32_t r4_2
00013458      if ((arg2 u<= 1 && arg3 != 0 && r0_2 != 0) || arg2 u> 1 || (arg2 u<= 1 && arg3 == 0))
00013470          r4_2 = 0xffffffff
00013458      if (arg2 u<= 1 && arg3 != 0 && r0_2 == 0)
0001347a          if (arg1 != 0x20)
0001370c              r4_2 = 0xffffffff
00013482          else
00013482              int32_t r1 = arg3[1]
00013482              int32_t r2 = arg3[2]
0001348a              *(r6_1 + 0x2004a698) = *arg3
0001348a              *(r6_1 + &data_2004a69c) = r1
0001348a              *(r6_1 + &data_2004a6a0) = r2
0001349c              int128_t q0_1
0001349c              int128_t q1_1
0001349c              int64_t q2_1
0001349c              q0_1, q1_1, q2_1 = sub_7a76c(arg2, r6_1 + 0x2004a698, r6_1 + 0x2004a69c)
000134a0              uint32_t r3_3 = zx.d(*(r6_1 + &data_2004a855))

My first guess was to type r6_1 as an int pointer. This results in weird array references based on r6_1, though.

🚫00013434      unimplemented  {vpush {d8, d9}}
0001343c      int32_t r0_2
0001343c      int32_t* r6_1
0001343c      if (arg2 u<= 1 && arg3 != 0)
0001344c          r6_1 = arg2 * 0x208
00013456          r0_2 = (*(sub_6cdd4() + 0x80))(&r6_1[0x80129ea])
00013458          if (r0_2 != 0)
00013468              (*(setup_log() + 0xc))(0, "[ASSERT]%s @ %s @ %d", "get_fmu_dm()->lock_mutex(&barome…", "write_data_to_barometer", 0x386)
00013458      int32_t r4_2
00013458      if ((arg2 u<= 1 && arg3 != 0 && r0_2 != 0) || arg2 u> 1 || (arg2 u<= 1 && arg3 == 0))
00013470          r4_2 = 0xffffffff
00013458      if (arg2 u<= 1 && arg3 != 0 && r0_2 == 0)
0001347a          if (arg1 != 0x20)
0001370c              r4_2 = 0xffffffff
00013482          else
00013482              int32_t r1 = arg3[1]
00013482              int32_t r2 = arg3[2]
0001347e              r6_1[0x80129a6] = *arg3
0001347e              r6_1[0x80129a7] = r1
0001347e              r6_1[0x80129a8] = r2
0001349c              int128_t q0_1
0001349c              int128_t q1_1
0001349c              int64_t q2_1
0001349c              q0_1, q1_1, q2_1 = sub_7a76c(arg2, &r6_1[0x80129a6], &r6_1[0x80129a7])
joelreymont commented 1 year ago

Same issue here

000137d0                      if (zx.d(*((r2_1 << 3) + 0xf49d8)) == zx.d(var_7c))

and here where not all the values in the 0x200xxxxx range can be changed to pointers

00014ce8  void* sub_14ce8(uint32_t arg1, int32_t arg2, int32_t arg3, int16_t arg4, int64_t arg5 @ q0)

e.g.

00014d44          if (zx.d(*(0x438 * r6 + 0x2004b596)) << 0x1f s< 0)

Anyone from V35 should search for "Encourage Salesman Prompt Delay" to find the database.

joelreymont commented 1 year ago

Also, see the arbitrary treatment in sub_17d38 below. All the vars are integers.

00018358                  *(r5_1 + 0x200500a0) = var_128
00018358                  int32_t var_124
00018358                  *(r5_1 + &data_200500a4) = var_124
00018358                  *(r5_1 + &data_200500a8) = var_120
00018358                  *(r5_1 + &data_200500ac) = var_11c
00018360                  *(r5_1 + 0x200500b0) = var_118
00018360                  *(r5_1 + &data_200500b4) = var_114
00018360                  *(r5_1 + &data_200500b8) = var_110
00018360                  *(r5_1 + &data_200500bc) = var_10c
00018368                  *(r5_1 + 0x200500c0) = var_108
00018368                  int32_t var_104
00018368                  *(r5_1 + &data_200500c4) = var_104
00018368                  *(r5_1 + &data_200500c8) = var_100
00018368                  int32_t var_fc
00018368                  *(r5_1 + &data_200500cc) = var_fc
00018370                  *(r5_1 + 0x200500d0) = var_f8

also a whole bunch of other places in the same function.

fuzyll commented 1 year ago

I've been trying to track this down, and there are a few things I'd like to document at this point:

  1. The o key does not make something a pointer. It makes the display type a pointer. It's the equivalent of right-clicking and saying Display As -> Pointer, but as a hotkey.
  2. When you hit the o key on these values, and they aren't already a data variable, it will create a void one for you. This is accurate and what we expect to happen.
  3. If you are not in HLIL, this also changes the pointer display type correctly and displays you the thing you expect.
  4. If, however, you are in HLIL, it may work, or may not, and this is actually "the bug".

I don't actually know what "the bug" is, but here are my observations:

  1. If you hit o on 0x1348a in HLIL, it goes into the map of integer display types (in BNSetIntegerConstantDisplayType) with that address.
  2. This does not change the HLIL display.
  3. If you switch to MLIL and hit o on 0x1347e, it goes into the map of integer display types with that address.
  4. This does change the HLIL display.

Also, consider the following:

>>> current_function.get_int_display_type(0x1348a, 0x2004a698, 0xffffffff)
<IntegerDisplayType.DefaultIntegerDisplayType: 0>
>>> current_function.get_int_display_type(0x1347e, 0x2004a698, 0xffffffff)
<IntegerDisplayType.DefaultIntegerDisplayType: 0>
>>> current_function.get_int_display_type(0x1347e, 0x2004a658, 0xffffffff)
<IntegerDisplayType.PointerDisplayType: 9>
>>> current_function.get_int_display_type(0x1348a, 0x2004a658, 0xffffffff)
<IntegerDisplayType.DefaultIntegerDisplayType: 0>

I thought "the bug" was that, when HLIL is determining how to display this value, it looks up the address and value of the thing it came from in MLIL, and uses that display type. But, when you hit o and try to set this, it uses the address from HLIL instead of the one it will actually look up. But, the problem with that theory is that I would have expected the last line above to also show PointerDisplayType, and it does not.

So, in conclusion: There is a mismatch between what goes into the integer display type map, and what we look up in that map when displaying values in HLIL. This sometimes works out, and sometimes doesn't, and is likely the root cause of a whole host of HLIL display issues. I just don't entirely understand where this mismatch is occurring or how to fix it.

image

(Note: The reason HLIL shows 98 and MLIL shows 58 is because of the addition of 0x40 at 0x13486. So, although these two locations appear to be unrelated, HLIL is showing the calculated value of [tmp0_2].d.)

plafosse commented 1 year ago

I have a potential fix for this but its kinda wide reaching and needs a bunch of testing to be sure. Essentially we need a case within our MLIL translator that is able to coerce constants to pointers if they point to Functions (or potentially DataVariables) Though it is important that we ensure that the constant not be a simple intermediate, as this can cause problems in RISK like architectures where you have a load/store that requires multiple instructions to accomplish.

joelreymont commented 1 year ago

Also, see #4336

emesare commented 2 weeks ago

This was partially added with 4.2.6059-dev, where any MLIL_CONST will be promoted to an MLIL_CONST_PTR if the constant value is the address of a data variable.

At 0x1348a this is the new behavior (me hitting o on each constant).

https://github.com/user-attachments/assets/2f3195da-ee78-4742-b4fb-1c7cc8143871

At 0x137d0 this is the new behavior (no user interaction).

image

While the above cases seem to be fixed, I am unsure of the scope of the change specifically for user interaction such as hitting o. I am going to keep this issue open for the time being until there is more consensus.