Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
899 stars 203 forks source link

Weird bug: Undefining a function renamed reference #4311

Closed joelreymont closed 1 year ago

joelreymont commented 1 year ago

Version and Platform (required):

Consider the following BN disassembly

00049344  int32_t* const sub_49344(int32_t arg1)

00049344  0928       cmp     r0, #9
00049346  0ad8       bhi     #0x4935e

00049348  064b       ldr     r3, [pc, #0x18]  {data_49364}
0004934a  0c22       movs    r2, #0xc
0004934c  02fb0030   mla     r0, r2, r0, r3
00049350  8368       ldr     r3, [r0, #8]
00049352  23b1       cbz     r3, #0x4935e

00049354  4368       ldr     r3, [r0, #4]
00049356  002b       cmp     r3, #0
00049358  08bf       it      eq
0004935a  0020       movs    r0, #0  {0x0}
0004935c  7047       bx      lr

0004935e  0020       movs    r0, #0
00049360  7047       bx      lr

00049362        00 bf                                                                                        ..

00049364  void* data_49364 = data_20056a40

The last line used to be

00049364  void* data_49364 = SYS_TASKS

before I undefined the function.

I cannot reproduces this now but I also remember that undefining the function didn't work from HLIL where I first tried it. Also, it created a weird function on top of my existing one which I had to undefine first. Ultimately, undefining did work from the disassembly view but it renamed my SYS_TASKS which is defined thusly

struct __data_var_refs sys_task_t __packed
{
    char* name;
    void* proc;
    void* ctx;
};
fuzyll commented 1 year ago

In the database we have from you, there are actually two functions defined at 0x49344. Undefining the first one removes the reference entirely because its address is included in the second function.

I think part of the problem is that this database appears to have been created with standalone as the platform, rather than linux-thumb2. This means that every time you're hitting p to create a function, it's not using our linux platform's calling conventions and other things in analysis. It looks like some parts of this database have had that run on it, but it's just resulted in a ton of duplicate functions where some are quite wrong.

Changing this results in making some of the duplications pretty blatantly obvious, like this very function:

image

This is very probably what your "weird function on top of your existing one" that you had to undefine was in the first place.

joelreymont commented 1 year ago

How can I clean up my database?

plafosse commented 1 year ago

Unfortunately there isn't many good options other than manually reverse engineering and figuring things out for yourself. One thing that could help out is to identify overlapping basic blocks. This is frequently a sign that we've screwed up somewhere in our linearsweep algorithm.

For instance I wrote this code which detects all overlapping basic blocks:

>>> overlapping_blocks = set()
... overlapping_functions = set()
... for bb in bv.basic_blocks:
...  if len(bv.get_basic_blocks_at(bb.start)) > 1:
...    overlapping_blocks.add(bb.start)
...    overlapping_functions.add(bb.function.start)
... 
... 
>>> 
>>> len(overlapping_blocks)
2048
>>> len(overlapping_functions)
187
>>> [hex(s) for s in overlapping_functions]
['0x12402', '0xefc02', '0x23208', '0x1dc10', '0x71a10', '0x13412', '0x13414', '0x62416', '0x30618', '0x4a1a', '0x51e1c', '0x12c20', '0x21420', '0x9c28', '0x13428', '0x6fc28', '0xef828', '0xa9a2c', '0xab22c', '0xef82a', '0x10c30', '0x30634', '0x83034', '0x24836', '0xaba34', '0x409fc', '0x1c03e', '0x8440', '0x10840', '0x1e640', '0x71a44', '0x2d248', '0x1ae4e', '0x4c50', '0x3ee50', '0x10ff8', '0x2865a', '0x2d260', '0xef862', '0x4a66', '0xc866a', '0x8a70', '0x4c74', '0x1a278', '0x2727c', '0x4a80', '0x1f880', '0x4c84', '0xa95f8', '0xae688', '0x4a90', '0x72694', '0x13de94', '0x40898', '0x4089a', '0xefa9a', '0x1a2a0', '0x13e4a0', '0xabca4', '0x13c2a6', '0x368a8', '0x5bcac', '0x304b0', '0x6feb0', '0x984b6', '0x108bc', '0x1fac0', '0x244c0', '0x24ac2', '0x304c0', '0x71ac2', '0x8bec0', '0xc34c6', '0xd2ac4', '0x5df8', '0x54cc8', '0x400cc', '0x400ce', '0x78cd0', '0x62ed2', '0x9d8d2', '0x9d8d6', '0xa7ed6', '0x500dc', '0x6fedc', '0x53ee0', '0x718e0', '0x4ae4', '0xc0e4', '0xc6ae6', '0x4aec', '0xbd6ec', '0x310f0', '0x75cf0', '0x310f2', '0xefef4', '0x13eef8', '0x77500', '0x13c300', '0x104704', '0x1bd08', '0x1ed0a', '0x13df0e', '0xa112', '0x4718', '0x11b18', '0x3f718', '0x7191a', '0x3f71c', '0xa8718', '0x2991e', '0xeff1e', '0x4b22', '0x9d322', '0x3ef28', '0x62b28', '0x472c', '0xefb36', '0x7938', '0x1ff38', '0x853a', '0x21938', '0x5a738', '0x729fc', '0x1b53e', '0xae338', '0x62740', '0x9d540', '0x11742', '0x11346', '0x7bd4c', '0xc114c', '0x98154', '0x77156', '0x8bb58', '0xbe558', '0x4fd5c', '0x7895c', '0xc0f5c', '0x1a360', '0x72360', '0xef962', '0x2df64', '0x3f568', '0x3f56c', '0xa896e', '0x12b70', '0x4778', '0x617a', '0x77180', '0x5a784', '0x8dd84', '0xbe584', '0x26788', '0x2c988', '0x9194', '0x2d198', '0x1efaa', '0x9efac', '0x1a3b0', '0x1ddb0', '0x75b8', '0x6fbb8', '0xab5ba', '0x127be', '0x5bfc2', '0x1e9c4', '0x969c8', '0x1c5ca', '0x2d1d0', '0xab5d2', '0xc19d6', '0xbdd8', '0x633dc', '0x1b1e8', '0x8d9e8', '0xefdea', '0x2dbec', '0x1fbf0', '0x56bf0', '0x9cff0', '0x9f7f2', '0xd1df0', '0x1e9f8', '0x49fa', '0x8dfc', '0x409fe']

All the the above are potential areas where linearsweep made a bad decision and may need to be corrected.