BinaryAnalysisPlatform / bap

Binary Analysis Platform
MIT License
2.07k stars 273 forks source link

adds the `--print-missing` option to print unlifed instructions #1409

Closed ivg closed 2 years ago

ivg commented 2 years ago

This is a quality of life feature intended for the lifter writers. Here is the sample output,

$ bap powerpc32-linux-gnu-echo --print-missing
0x10000c64: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c68: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c6c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c70: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c74: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c78: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c7c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c20: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c24: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c28: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c2c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c30: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c34: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c38: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000c3c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000488: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x1000048c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x100004ac: 7c 84 01 95 ; addze. 4, 4 ; (llvm-powerpc32:ADDZE_rec R4 R4)
0x100004cc: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x1000059c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x100005bc: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x100005f4: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x100005f8: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x100005fc: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x1000052c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000a54: 7f 9c 16 71 ; srawi. 28, 28, 2 ; (llvm-powerpc32:SRAWI_rec R28 R28 0x2)
0x10000a64: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000a68: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000a6c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b48: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b4c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b08: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b0c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b10: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b14: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b18: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)
0x10000b1c: 60 00 00 00 ; nop ; (llvm-powerpc32:NOP)

Histogram:
1    SRAWI_rec
1    ADDZE_rec
35   NOP

Lifted:  516
Failed:  0
Missed:  37

The output is designed in such way that the most important information is printed in the end, so that you don't need to scroll. It first prints every instruction that lacks a core theory representation (semantics). Each instruction is printed with its address, bytes, assembly string (if present) and with the Primus Lisp function call that will be made to obtain the semantics of this instruction.

Next, it prints histogram of missed instructions, sorted by the number of occurrences of the opcodes, so that you can focus on the most missing instructions first. In our case, we see that most of the instructions are just nops, but there are also two non-trivial instructions that we missed.

The histogram is followed by the statistics of lifted (have semantics), failed (no disassembly at all), and missed (no semantics) instructions. The sum of three statistics will give you the total number of instructions in the Knowledge Base (KB), which may differ from the total number of instructions that you see from the output of -dasm, as the former includes the unreachable instructions as well.

When this option is specified, the cache is not used to load the program and the binary is disassembled from scratch.