Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
942 stars 213 forks source link

Common pattern should be simplified in HLIL #2596

Open toshipiazza opened 3 years ago

toshipiazza commented 3 years ago

Binary Ninja Version

2.4.2950-dev Personal

Describe the bug

Binja can't optimize IL that looks like this:

00000084      char temp211 = 0
00000084      if (arg1)
00000084          temp211 = 1
00000090      if (temp211 == 1)
00000090          return 1
0000008c      return 0

To Reproduce Steps to reproduce the behavior:

  1. Install https://github.com/google/binja-hexagon/ plugin
  2. Open https://github.com/google/binja-hexagon/blob/main/test_binaries/prebuilt/bn_llil_test_app in binja
  3. Navigate to test_dualjump_cond_jump function

Expected behavior

I want HLIL to look like:

00000084      if (arg1)
00000090          return 1
0000008c      return 0

Version and Platform (required):

Additional context

Split from https://github.com/google/binja-hexagon/issues/4

galenbwill commented 3 years ago

I would need to take a closer look at the lifted IL, LLIL, and MLIL to be sure, but it appears this case would require additional support from the architecture plugin to simplify further. My suspicion is that the lifting to LLIL is using temporary registers universally to handle the old/.new register value "issue" and this may be disconnecting some of the data flow relationships.

But I would have to see where does the p0 register get set in the original hexagon code?

toshipiazza commented 3 years ago

Here's the lifted IL for that function

   0 @ 00000084  temp211.b = 0
   1 @ 00000084  temp1.d = R1 + R1
   2 @ 00000084  if (P0) then 3 else 5

   3 @ 00000084  temp211.b = 1
   4 @ 00000084  goto 5

   5 @ 00000084  R1 = temp1.d
   6 @ 00000084  if (temp211.b == 1) then 7 else 8

   7 @ 00000084  jump(data_90 => 9 @ data_90)

   8 @ 00000084  goto 13 @ 0x8c

   9 @ 00000090  temp0.d = 1
  10 @ 00000090  R0 = temp0.d
  11 @ 00000090  <return> jump(LR)

  13 @ 0000008c  temp0.d = 0
  14 @ 0000008c  R0 = temp0.d
  15 @ 0000008c  <return> jump(LR)

LLIL is identical.

Here's the MLIL:

   0 @ 00000084  temp211 = 0
   1 @ 00000084  if (arg1) then 2 else 4

   2 @ 00000084  temp211 = 1
   3 @ 00000084  goto 4

   4 @ 00000084  if (temp211 == 1) then 5 else 6

   5 @ 00000084  jump(data_90 => 7 @ data_90)

   6 @ 00000084  goto 9 @ 0x8c

   7 @ 00000090  R0_1 = 1
   8 @ 00000090  return 1

   9 @ 0000008c  R0 = 0
  10 @ 0000008c  return 0

HLIL is shown in the first comment.

The whole binary seems to be just a test suite for the lifter, so P0 is unset, and is inferred as an argument

toshipiazza commented 3 years ago

My suspicion is that the lifting to LLIL is using temporary registers universally to handle the old/.new register value "issue" and this may be disconnecting some of the data flow relationships

I think the specific issue here is that the binja-hexagon plugin defers all jumps to the end of the packet. temp211 is a temporary variable introduced to indicate the jump at 00000088 is taken (if (P0) jump:t data_90). When the packet ends the plugin inserts if guards for each "jump variable" that it created in the packet

plafosse commented 1 month ago

This is not related to hexagon at all this is simply a missing HLIL simplification. This can be demonstrated on x86 as well: image

Example test bndb test_2596.bndb.zip