Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
921 stars 208 forks source link

Change start and end address of function #1024

Closed saruman9 closed 5 years ago

saruman9 commented 6 years ago

How I can change the end of function? Binary Ninja don't correct define end of ARM function. I tried undefine the function for later manual creating, but this function cannot be undefined.

rssor commented 6 years ago

We don't require functions to be contiguous in memory, so we don't have the concept of a function 'end'.

Are you seeing a situation where disassembly occurs after a function that does not return (in which case applying __noreturn to the target of the function call will solve your problem) or where jumptable targets have not all been identified correctly (in which case you can provide the correct target information through the API)?

saruman9 commented 6 years ago

Yes, I seeing a situation where lr register modified then moved to pc register, so next instruction for execution will not be next instruction after the function, you are right. I added __noreturn property for the function for resolving jump problem but I think, that Binary Ninja should analyse cases when pc register modified.

rssor commented 6 years ago

That sounds a lot like a jumptable that we're failing to identify, which we should actually be handling -- can you share some of the code around the problem?

Also, the ability to undefine functions that were automatically discovered should land in the very near future.

saruman9 commented 6 years ago

Yes of course. This is function epilogue:

0x41200190: ldr     r0, data_412001ac
0x41200194: adr     r1, data_41200338
0x41200198: add     lr, r0, r1
0x4120019c: add     lr, lr, r9
0x412001a0: mov     r0, r5
0x412001a4: mov     r1, r6
0x412001a8: mov     pc, lr

I'm facing another problem, when works with this binary file: values of registers can not determined. Maybe this is the reason for analysis error? For example for the code above:

>>> current_function.get_reg_value_at(0x41200190, 'r0')
<undetermined>
>>> current_function.get_reg_value_at(0x41200194, 'r1')
<undetermined>
>>> current_function.get_reg_value_at(0x412001a8, 'lr')
<undetermined>

Also, the ability to undefine functions that were automatically discovered should land in the very near future.

Nice.

joshwatson commented 6 years ago

Use get_reg_value_after instead

rssor commented 6 years ago

You would want to be using get_reg_value_after (_at is for the value before the instruction executes).

There are several situations that could cause this: is the memory at data_412001ac read-only? If not, we don't allow the value to be consumed when calculating possible values, and we'd be unable to solve for this.

If it's read-only, then chances are we weren't able to extract the possible values of r9.

function.set_user_indirect_branches would allow you to manually set the correct set of targets yourself and then you wouldn't need to use __noreturn.

saruman9 commented 6 years ago

data_412001ac read-only? we weren't able to extract the possible values of r9.

You were absolutely right! I added section with ReadOnlyCodeSectionSemantics and value of r0 register was set. Value of r9 register computed and depends on function arguments. Can I somehow set indirect_branches to undetermined if I don't know correct set of targets?

So, I compute one possible target, which is independent of function arguments and apply set_user_indirect_branches. Need to say, that I research firmware, which should have load offset. I was looking information about loading with offset, but nothing not found except #38. I create big file, which contained target firmware on a required offset, and when I set branch, the reanalysis start. Binary Ninja eat all my memory, because loaded file is too big for analysing. Can I somehow load firmware with offset in Binary Ninja?

rssor commented 6 years ago

We should still be able to solve r9 in most cases, so I'm curious as to why that jump table is not being solved for automatically. What are the reported values in get_possible_reg_values_after for r9? As an example, a solved jump table will usually look something like this:

Calling set_user_indirect_branches with an empty list should clear the outgoing branches, which I assume is what you mean by undetermined?

For loading the file at a given offset the answer is to implement a BinaryView for now. That would allow you to set up the correct sections and mapping when the file is initially loaded and before any analysis occurs. The '#api-help' channel on our slack is a good place to get help with that, and there are some good examples available in public repositories:

https://github.com/Vector35/binaryninja-api/blob/dev/python/examples/nes.py#L520 https://github.com/joshwatson/binaryninja-microcorruption/blob/master/__init__.py

bpotchik commented 6 years ago

You can remove any function now as of build 1.1.1175-dev.

saruman9 commented 6 years ago
>>> current_function.get_low_level_il_at(0x412001a8).get_possible_reg_values_after('r9')
<undetermined>

But somewhere in the middle of function:

>>> current_function.get_low_level_il_at(0x41200120).get_possible_reg_values_after('r9')
<not in set([0x82400150])>

One of the possible paths for computing r9 register (r9 = 0):

<0x41200090: sub_41200090:>
<0x41200090: mov     r4, r0>
<0x41200094: mov     r5, r1>
<0x41200098: mov     r6, r2>
<0x4120009c: mov     sp, r4>
<0x412000a0: adr     r0, data_41200150>
<0x412000a4: cmp     r0, r6>
<0x412000a8: moveq   r9, #0>
<0x412000ac: beq     0x4120014c>

<0x4120014c: ldr     r0, data_41200044>
<0x41200150: ldr     r1, data_4120004c>
<0x41200154: mov     r4, r6>
<0x41200158: add     r0, r0, r4>
<0x4120015c: add     r1, r1, r4>
<0x41200160: mov     r2, #0>

<0x41200164: cmp     r0, r1>
<0x41200168: bhs     0x41200178>

<0x4120016c: str     r2, [r0]  {0x0}>
<0x41200170: add     r0, r0, #0x4>
<0x41200174: b       0x41200164>

<0x41200178: mcr     p15, #0, r0, c7, c5>
<0x4120017c: mcr     p15, #0, r0, c7, c10, #0x4>
<0x41200180: mcr     p15, #0, r0, c7, c5, #0x4>
<0x41200184: ldr     r0, data_41200488>
<0x41200188: add     r0, r0, r9>
<0x4120018c: mcr     p15, #0, r0, c12, c0>
<0x41200190: ldr     r0, data_412001ac>
<0x41200194: adr     r1, data_41200338>
<0x41200198: add     lr, r0, r1>
<0x4120019c: add     lr, lr, r9>
<0x412001a0: mov     r0, r5>
<0x412001a4: mov     r1, r6>
<0x412001a8: mov     pc, lr>

Calling set_user_indirect_branches ... is what you mean by undetermined?

No, I mean a situation, when I know, that indirect branch exist, but address for jump is unknown, because value of pc register depends on function arguments, for example.

Thank you for detailed info about loading the file at a offset.

saruman9 commented 6 years ago

By the way, IDA does not recognize possible values of pc (and accordingly most likely r9) and doesn't recognize function to which control is transferred too.

plafosse commented 5 years ago

The issue title of setting the "end" or "start" of a function will not be changed. It seems as though the underlying issue here has been fixed though.

flowswitch commented 2 years ago

Got here by the issue title. I've got into a situation where BN treats a main() as a continuation of startup code that jumps there, but I want to declare it as a separate function. What are my options there? Pressing P doesn't create anything, just gets me to a graph view of concatenated startup and main. Undefining a current function makes both main and startup continuations of another piece of startup with a jump. In IDA I would just set the end of startup at the jump, making the continuation a "code not belonging to any function" (or unexplored bytes), then define a new function there.

psifertex commented 2 years ago

Try patching the jump to main into a call instruction or nop or anything else.If a call, then main should be a function, if something else, then you can hit p on main. I think this is an edge case because of how the functions are created directly from the entry point but I could be wrong.

flowswitch commented 2 years ago

Uff, is it a BN's architecture limitation or just UI's? I'm totally ok with and idea of writing some script for a corner case, the question is what is possible and what is not.

psifertex commented 2 years ago

Right now it's an architectural limitation that you don't tell a function where to start or stop. That's handled by an architecture plug-in's analysis and lifting.

That said, the exact situation you're describing seems off and related to just an entry point as I don't believe that is expected behavior.

plafosse commented 2 years ago

I think there might be a simple solution to this.

  1. Undefine start. Right-click "Undefine Current Function"
  2. Navigate to the address of main. Hit p
  3. Now got back to the address of start and hit p
plafosse commented 2 years ago

This issue is that if its a direct jump then we don't see this as a separate function. If however main is created first then we determine that its a tail call and everything will probably look right.