Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
908 stars 207 forks source link

LLIL/MLIL to HLIL instruction translations result in HLIL_NOP with an invalid instruction index #2492

Closed cetfor closed 2 years ago

cetfor commented 3 years ago

Describe the bug When referencing a LLIL or MLIL instruction, you can use .llil or .mlil to switch between the IL levels. I've never seen these fail. However when attempting to do the same for HLIL using .hlil you will sometimes be given a reference to an HLIL_NOP instruction with a correct address but an invalid instruction index.

To Reproduce Here's a headless script I use to find these issues. In this example I'm just using /bin/true which produces far less of these cases than the x86 targets I'm interested in.

import binaryninja

with binaryninja.open_view("/bin/true") as bv:
    funcs = bv.functions
    for func in funcs:
        callees = func.callees
        natv_call_sites = func.call_sites # native call sites
        llil_call_sites = [func.get_llil_at(x.address) for x in func.call_sites]
        mlil_call_sites = [func.get_llil_at(x.address).mlil for x in func.call_sites]
        hlil_call_sites = [func.get_llil_at(x.address).hlil for x in func.call_sites]
        if len(callees) > 0:
            print("\nFunction {} calls {} function(s).".format(func.name, len(callees)))
            for i, callee in enumerate(callees):
                print("  Callee {}: {}".format(i+1, callees[i].name))
                (natv_index, natv_addr) = ("N/A", hex(natv_call_sites[i].address)) if natv_call_sites[i] else ("None", "None")
                (llil_index, llil_addr) = (llil_call_sites[i].instr_index, hex(llil_call_sites[i].address)) if llil_call_sites[i] else ("None", "None")
                (mlil_index, mlil_addr) = (mlil_call_sites[i].instr_index, hex(mlil_call_sites[i].address)) if mlil_call_sites[i] else ("None", "None")
                (hlil_index, hlil_addr) = (hlil_call_sites[i].instr_index, hex(hlil_call_sites[i].address)) if hlil_call_sites[i] else ("None", "None")
                print("    - [Natv] Call site @ {}:{}.".format(natv_index, natv_addr))
                print("    - [LLIL] Call site @ {}:{}.".format(llil_index, llil_addr))
                print("    - [MLIL] Call site @ {}:{}.".format(mlil_index, mlil_addr))
                print("    - [HLIL] Call site @ {}:{}.".format(hlil_index, hlil_addr))

Example output showing issue:

...snip...
Function sub_48e0 calls 4 function(s).
  Callee 1: __freading
    - [Natv] Call site @ N/A:0x48e9.
    - [LLIL] Call site @ 7:0x48e9.
    - [MLIL] Call site @ 5:0x48e9.
    - [HLIL] Call site @ 18446744073709551615:0x48e9.
  Callee 2: fflush
    - [Natv] Call site @ N/A:0x48fe.
    - [LLIL] Call site @ 6:0x48fe.
    - [MLIL] Call site @ 4:0x48fe.
    - [HLIL] Call site @ 3:0x48fe.
  Callee 3: sub_4920
    - [Natv] Call site @ N/A:0x4912.
    - [LLIL] Call site @ 13:0x4912.
    - [MLIL] Call site @ 9:0x4912.
    - [HLIL] Call site @ 1:0x4912.
  Callee 4: fflush
    - [Natv] Call site @ N/A:0x491b.
    - [LLIL] Call site @ 16:0x491b.
    - [MLIL] Call site @ 11:0x491b.
    - [HLIL] Call site @ 2:0x491b.
...snip...

The output shows function sub_48e0 calls four functions, one of those being __freading. When we reference the call to __freading in the disassembly, LLIL, and MLIL, the correct instruction information is returned, however for HLIL, we see a index of 18446744073709551615 returned with the correct address, but does not appear in the HLIL.

If we inspect this instruction it's an HLIL_NOP operation with a valid address but an invalid instr_index.

Expected behavior I would expect this LLIL/MLIL to HLIL conversion to work every time, not certain instructions and not others.

Screenshots The attached screenshot shows my use case for this (using a different binary than true from above). I'm attempting to locate all call sites in HLIL. This works flawlessly in LLIL and MLIL, but breaks for some call sites in HLIL. Without this ability I'm uncertain how to locate HLIL call sites without manually parsing HighLevelILInstruction types.

hlil_conversion

Desktop (please complete the following information):

bpotchik commented 2 years ago

Fixed in 3.1.3638-dev.