Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
919 stars 207 forks source link

Analysis is non-deterministic #3852

Open resrever opened 1 year ago

resrever commented 1 year ago

Version and Platform (required):

Bug Description: Different analyses introduce non-determinism into Binary Ninja. There appear to be two primary sources of the differences:

  1. Comparisons show up as ">" or "f>" inconsistently in MLIL.
  2. Phi variable numbers are inconsistent in MLIL_SSA

These differences carry through to HLIL analysis.

Steps To Reproduce: Run my test_binja_consistency.py script that runs analysis multiple times until it sees a difference. This script displays differences at several different IL levels, and also outputs {il_form}.baseline and {il_form}.other files for use with a better diff utility.

Expected Behavior: Consistent results when running analysis repeatedly.

Additional Information:

I am attaching 3 files:

  1. test_binja_consistency.py - The test script I have been using to narrow down non-determinism.
  2. il_diffs.zip - output from some of my runs of the script. The .baseline and .other files can be diffed using any diff utility (e.g. diff mlil.*, colordiff mlil.*, or vim -d mlil.*)
  3. df - the standard linux "df" binary that I have been testing against.

test_binja_consistency.zip il_diffs.zip df.zip

resrever commented 1 year ago

When working on a .bndb, analysis is cached so it hides the HLIL non-determinism. This works very well for my use case, as I can generate the .bndb multi-threaded and test faster with the cached results.

One minor thing to note is that even with the .bndb the MLIL SSA still non-deterministically assigns variable names to the phi functions.