Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
928 stars 209 forks source link

Need heuristics to determine when a high-confidence type should not be completely trusted #5922

Open fuzyll opened 2 months ago

fuzyll commented 2 months ago

What is the feature you'd like to have? There are a number of situations where we receive high-confidence types from places like debug info (e.g. DWARF) or the demangler and the types we receive are obviously wrong. As examples:

  1. DWARF information for something like glibc will not be able to "see through" GNU indirect functions and will report the wrong name, overwriting the correct name (which probably came from the demangler).
  2. The demangler might represent a function type that has a hidden initial argument without all of its required parameters. For example, in #5920, QString __cdecl QString::fromUtf8(const char *str, qsizetype size) is actually three arguments with the initial argument being a structure return type described here).

In these cases, type information from a different source (or from our own analysis) might actually be more accurate than what we're being fed from external sources. It would be nice to have a "double-check" step that applies some adjustments in these situations by either combining information from multiple sources or overriding higher-confidence data.

As a first step, it might also be useful to just detect these cases and hand their resolution over to the user (e.g. by tagging them all and having some indication of what the potentially detected problem was).

Is your feature request related to a problem? Yes, see above.

Are any alternative solutions acceptable? Anything that arrives at the 'correct' solution in these cases should be acceptable.

Additional Information: Binaries that exhibit both of these cases are available upon request.

emesare commented 1 month ago

2 Is also fundamentally indescribable from our current calling convention API, we should identify structure/memory returns as described by the ABI and add the hidden return argument. In the case of imports with demangled names where we don't have any backing function we would have to assume this behavior when a return type is over some calling convention specific size (in this case we assume QString is not a bare-type). When we can actually analyze the function we should be able to figure out based off the register specified.

Here's a good resource https://blog.aaronballman.com/2012/02/describing-the-msvc-abi-for-structure-return-types/ I also added it to the initial comment to limit confusion.

fuzyll commented 4 weeks ago

Issue #2275 may be required before this one.