airbus-seclab / bincat

Binary code static analyser, with IDA integration. Performs value and taint analysis, type reconstruction, use-after-free and double-free detection
1.69k stars 163 forks source link

How to interpret the output ini? #121

Closed mzhang28 closed 3 years ago

mzhang28 commented 3 years ago

I'm not sure how to interpret the output INI file without using IDA and I can't seem to find anything in the documentation that explains the meaning behind the fields under each of the node sections. Can you please add some info about this? Thank you!

DarkaMaul commented 3 years ago

Hello,

The output (e.g. out.ini ) is structured as follows: For every instruction in the function; a new node (and section in the ini file) is created.

[node = 0]
address = 0x3eb0
bytes = 55
final =false
tainted=?
statements = esp <- (esp - 0x4);
 (32)[esp] <- ebp;

The node ID (here: 0) is there only to be able to links node together and represent the first ID available when BinCAT planned to analyze this instruction. The first two fields are self explanatory. The final boolean is true whenever a widening operator has been applied (direct quote from the code in cfa.ml file). Tainted represent the possible taints on the node (separated by commas). The taints ID are listed in a following section of the file. Statements are the BinCAT IR statements created for this node/instruction.

Following such nodes, one will find for each statement in the node a new section :

[node 0 - unrel 0]
description =  
mem[0xb8000000*8193] = 0b????????!0b????????
reg[cf] = 0b?!0b?
...
reg[...] = ...

Those sections describe the state of the CPU (memory/registers) after a statement has been applied to the previous state.

Finally a few sections are found at the end as shown below :

[program]
null = 0x00
mem_sz=32
stack_width=32
architecture = x86

[taint sources]
1 = r-eax

[edges]
e0_1 = 0 -> 1
e1_2 = 1 -> 2
e2_3 = 2 -> 3

The program section will display some broad information about the program. The taint sources list every taint id (in order to understand the tainted fields in the node output) and the source of the taint. Finally, the edges section list every dependencies between nodes : see it as a dot representation of the CFG.

As I'm not a developer of the tool, I may be wrong on some points but I guess @szennou will correct me if needed.

szennou commented 3 years ago

we've just added in doc/output_content.md We may have forgotten some items. Let us know if it is the case