Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
927 stars 209 forks source link

Add the ability to interpret data as code #4686

Closed imethod closed 1 year ago

imethod commented 1 year ago

What is the feature you'd like to have? Added the ability to interpret data as code

Is your feature request related to a problem? In certain situations "ninja"if there is no way to know that some data is code, and I know that this is code I don't seem to have a way to tell "ninja" this is code.

When I analyzed the sample, the program only used "blx lr" to indirectly jump to the next function ningji thought that the following data did not need to be analyzed

image image image

fuzyll commented 1 year ago

You should be able to hit 'p' on 0x0000cfdc to turn the code after the blx lr into a function. I understand this isn't exactly what you were looking for, but it'll at least show the disassembly.

The reason we didn't analyze that code is likely because we couldn't identify anything that would lead to it being executed. I have no idea what the blx lr leads to in this case, but I'm going to guess the function it branches to will return here and execute the rest of the instructions below.

Are you able to share this executable? We should investigate why it wasn't able to identify that data as code.

imethod commented 1 year ago

demo.zip demo.zip rename demo.so

0xcb54 create func goto cfd8

imethod commented 1 year ago

You should be able to hit 'p' on 0x0000cfdc to turn the code after the blx lr into a function. I understand this isn't exactly what you were looking for, but it'll at least show the disassembly.

The reason we didn't analyze that code is likely because we couldn't identify anything that would lead to it being executed. I have no idea what the blx lr leads to in this case, but I'm going to guess the function it branches to will return here and execute the rest of the instructions below.

Are you able to share this executable? We should investigate why it wasn't able to identify that data as code.

"but I'm going to guess the function it branches to will return here and execute the rest of the instructions below" yes If you dynamically debug it will return

That's why I wanted to add this feature because it's the kind of gimmicks that sample authors do when they don't want you to do static analysis, which is a limitation of static analysis

xusheng6 commented 1 year ago

Here is a way to fix up the analysis: go to 0xcfd8, and switch to MLIL. You should see the following code:

 284 @ 0000cfd4  int32_t lr_2 = r11_4
 285 @ 0000cfd8  jump(lr_2)

Then select the lr_2 at line 284 (the first line), right-click, Set User Variable Value. In the dialog that shows up, select Constant Pointer Value and enter the desired address (0xcfdc or whatever your want) below it.

Screenshot 2023-10-18 at 6 47 08 PM

Click Accept and you should see the control flow is fixed:

Screenshot 2023-10-18 at 6 48 22 PM

xusheng6 commented 1 year ago

Also I think IDA got this correct simply because the indirect branch jumps to the byte immediately after the branch. I do not think it actually figures out the value of lr and acted upon it. It just continues the disassembly and get lucky this time.

Binary Ninja takes a slightly different approach. When it sees an indirect branch like this, it will stop the disassembly be default. If it can figure out the target of the branch, e.g., via constant propagation, it will continue the disassembly at the correct target. This works better than IDA's behavior in many cases. However, for this case, we are unable to figure out the target automatically, so the disassembly just stopped. I will see if there is an improvement that we can do to improve it.

xusheng6 commented 1 year ago

An alternative fix is to this API https://api.binary.ninja/binaryninja.function-module.html#binaryninja.function.Function.set_user_indirect_branches:

current_function.set_user_indirect_branches(0xcfd8, [(bv.arch, 0xcfdc)])
fuzyll commented 1 year ago

I'm going to close this issue as a duplicate of the following:

As @xusheng6 pointed out above, us not disassembling past the blx lr is intentional and multiple different solutions exist (with UIDF being the one we'd recommend in this case).

Additionally, we've created a plugin that, I believe, does what you want. We don't really recommend using it over the other options available, though. You can check out the discussion in the other issues above for more of our rationale on why.

We'll also continue looking into whether we can improve situations like this going forward with better dataflow and/or heuristics. Thanks again for the report and the binary sample!

imethod commented 1 year ago

Also I think IDA got this correct simply because the indirect branch jumps to the byte immediately after the branch. I do not think it actually figures out the value of lr and acted upon it. It just continues the disassembly and get lucky this time.

Binary Ninja takes a slightly different approach. When it sees an indirect branch like this, it will stop the disassembly be default. If it can figure out the target of the branch, e.g., via constant propagation, it will continue the disassembly at the correct target. This works better than IDA's behavior in many cases. However, for this case, we are unable to figure out the target automatically, so the disassembly just stopped. I will see if there is an improvement that we can do to improve it.

"blx lr" lr The address is 0x9e24 If "ninja" uses this heuristic disassembly then there must be a way to set the value of the return address of the function, because the method returned by "0x9e24" is very special ,"ninja" does not recognize 0xcfdc at this address It also supports setting different function return values

imethod commented 1 year ago

4688

Well, it looks like a blx problem, and ninja's belief that the program has returned my previous conclusions is wrong

fuzyll commented 1 year ago

Yeah, @xusheng6's answer is not ideal because he has suggested making the branch target the next address down, which isn't what actually happens during execution. It was just a hack he suggested to get Binary Ninja to keep disassembling past the instruction.

And yes, it does appear there's an actual issue with blx handling at play here. Thanks again for the useful test case!

fuzyll commented 1 year ago

One last thing: If you need this to work now, you can build your own version of the ARMv7 architecture module, but with this commit (the one I linked to in #4688) removed and you'll get the desired behavior. Unfortunately, we can't make that the default behavior because it breaks other binaries, but we are looking into a better solution that handles both cases.

bpotchik commented 1 year ago

Fixed in 3.6.4594-dev.