Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
887 stars 198 forks source link

Data xrefs behave strangely on thumb2 #2731

Open ntheis-anvilsecure opened 2 years ago

ntheis-anvilsecure commented 2 years ago

Binary Ninja Version

2.4.3076-dev

Describe the bug Data xrefs to thumb2 functions are not displayed when the name of the function is highlighted in the UI.

To Reproduce Acquire a thumb2 binary. Find a function with a function pointer pointing to it. In linear view, look at the function and highlight its name.

The function pointer will not be shown as a data xref.

Now highlight the first instruction of the function. The function pointer should now appear as an xref.

Expected behavior When I highlight the name of a function, I should see all xrefs, code and data, to it.

Version and Platform (required):

More details

This seems to be the result of annoying emergent behavior from the interaction of 3 features and 1 property of thumb2:

In BN, thumb2 functions are considered to start at, well, the address of their first instruction (not their address+1), e.g. FunctionSymbols point to the actual address with LSB=0. thus, highlighting a function's name (e.g. in the "header" of linearview) displays xrefs to the address with LSB=0 (and here also points to LSB=0)

data xrefs seem to point to whatever the value of the actual pointer is. for pointers to thumb2 functions, thus they point to address+1, LSB=1

code xrefs point to the address of the function, i.e. they point to LSB=0

and

thumb2 instructions are all at least 2 bytes long

thus, highlighting the name of a thumb2 function generally doesn't show data xrefs to it, but does show code xrefs (which can be quite misleading!)

while, confusingly, because thumb2 instructions are all at least 2 bytes long, when you highlight the first instruction in a function, you see both data and code xrefs (because you've selected the entire instruction, which is at least two bytes, thus xrefs to both LSB=0 and LSB=1 show up)

ntheis-anvilsecure commented 2 years ago

This may be relevant to #1617 as allowing xrefs to refer to higher-level things than just addresses seems necessary to fix both of these properly, because there's no inverse of Architecture.get_associated_arch_by_address- you can't ask "given that there's a function of this architecture at this address, what pointer values could possibly point to it?" (e.g. "given a thumb2 function at 0x8001234, what pointer values could point to it" (in this case, the answer being [0x8001235])), you can't generically fix this when querying for xrefs. Seems you'd currently need to fix it when creating xrefs (when creating a data xref, you'd have to look at the new address returned by get_associated_arch_by_address... but you'd have to treat function and data pointers differently, because a pointer to data at 0x8001235 doesn't point to 0x8001234. Quickly, this becomes a big mess...)