B2R2-org / B2R2

B2R2 is a collection of useful algorithms, functions, and tools for binary analysis.
https://b2r2.org
MIT License
418 stars 62 forks source link

Does it take three to four seconds for each instruction to be decompiled? Is there any way to improve the efficiency? #48

Closed yechenn closed 1 year ago

yechenn commented 1 year ago

I have deployed B2R2 successfully and used the following example to process instructions and output their IR.

open B2R2
open B2R2.FrontEnd
[<EntryPoint>]
let main argv =
  let isa = ISA.OfString "amd64"
  let bytes = [| 0x65uy; 0xffuy; 0x15uy; 0x10uy; 0x00uy; 0x00uy; 0x00uy |]
  let handler = BinHandler.Init (isa, bytes)
  let ins = BinHandler.ParseInstr handler 0UL
  ins.Translate handler.TranslationContext |> printfn "%A"
  0

However, I have observed that it takes around three to four seconds to process each instruction.

Considering that an average ELF file typically contains tens of thousands of instructions, and if we need to process hundreds of thousands of such ELF files, the computational time required becomes substantial.

Even with the utilization of multi-threading, it remains difficult to effectively address this issue.

Therefore, I would like to inquire if there are any recommended methods to solve this problem.

sangkilc commented 1 year ago

It seems that you are creating BinHandle for every instruction. We recommend to instantiate it only once.

For more information how to use it, you can see the source of our BinDump tool. When I use our BinDump tool to disassemble C library (/lib/x86_64-linux-gnu/libc.so.6), which contains 400K instructions in its .text section, it takes less than a second to disassemble them all.

First, do the followings

$ cd B2R2/src/RearEnd/BinDump
$ time dotnet run -c Release -- /lib/x86_64-linux-gnu/libc.so.6  -S .text > /dev/null

real    0m2.641s
user    0m2.565s
sys 0m0.295s

Of course, the dotnet command incurs some start-up cost, so you can directly invoke the compiled binary as follows:

$ time ./bin/Release/net6.0/B2R2.RearEnd.BinDump /lib/x86_64-linux-gnu/libc.so.6 -S .text > /dev/null

real    0m0.888s
user    0m0.820s
sys 0m0.068

So on my machine, parsing and disassembling 400K instructions took about 0.888 seconds. Notice that if you write your own code using our API to parse a binary stream, it will be much faster because BinDump is doing other stuffs for parsing ELF headers, etc.

yechenn commented 1 year ago

We have tested the example you provided, but we found that it only reads the instructions from the executable file and outputs the disassembled opcodes. Here is the example and its output:

The cmd code is:

C:\Program Files\dotnet\B2R2\src\RearEnd\BinDump> dotnet run -c Release -- C:\Windows\hh.exe -S .text >C:\Users\YC\Desktop\keyan\iotmalware\1.txt

The output is :

[C:\Windows\hh.exe]

# (.text)

0000000140001000: CC                                   int3
0000000140001001: CC                                   int3
0000000140001002: CC                                   int3
0000000140001003: CC                                   int3
0000000140001004: CC                                   int3
0000000140001005: CC                                   int3
0000000140001006: CC                                   int3
0000000140001007: CC                                   int3
0000000140001008: 4C 89 44 24 18                       mov qword ptr [RSP+0x18], R8
000000014000100d: 4C 89 4C 24 20                       mov qword ptr [RSP+0x20], R9
0000000140001012: 53                                   push RBX
0000000140001013: 56                                   push RSI
0000000140001014: 57                                   push RDI
0000000140001015: 48 83 EC 20                          sub RSP, 0x20
0000000140001019: 33 FF                                xor EDI, EDI
000000014000101b: 48 8D 42 FF                          lea RAX, qword ptr [RDX-0x1]
000000014000101f: 48 3D FE FF FF 7F                    cmp RAX, 0x7ffffffe
0000000140001025: 48 8B F1                             mov RSI, RCX
0000000140001028: B9 57 00 07 80                       mov ECX, 0x80070057
000000014000102d: 0F 47 F9                             cmova EDI, ECX
0000000140001030: 85 FF                                test EDI, EDI
0000000140001032: 78 3B                                js +0x3d ; 0x14000106f
0000000140001034: 48 8D 5A FF                          lea RBX, qword ptr [RDX-0x1]
0000000140001038: 48 8B CE                             mov RCX, RSI
000000014000103b: 48 8B D3                             mov RDX, RBX
000000014000103e: 4C 8D 4C 24 58                       lea R9, qword ptr [RSP+0x58]
0000000140001043: 33 FF                                xor EDI, EDI
0000000140001045: 48 FF 15 24 22 00 00                 call qword ptr [RIP+0x2224] ; <_vsnprintf>
000000014000104c: 0F 1F 44 00 00                       nop dword ptr [RAX+RAX+0x0]
0000000140001051: 85 C0                                test EAX, EAX
0000000140001053: 78 0F                                js +0x11 ; 0x140001064
0000000140001055: 48 98                                cdqe
0000000140001057: 48 3B C3                             cmp RAX, RBX

Our objective is to convert each instruction of a binary program into LowUIR and produce a txt file that encompasses all instructions of the binary program in LowUIR format.

Following your example provided on GitHub, we initially utilized Python to extract the instructions from the binary program and generated "Program.fs". Like this:

open B2R2
open B2R2.FrontEnd
open System.IO
let appendFile (content: string) (filePath: string) = File.AppendAllText(filePath, content)
[<EntryPoint>]
let main argv =
    let isa = ISA.OfString "aarch64"
    let bytes = [| 0xf0uy; 0x7buy; 0xbfuy; 0xa9uy |]
    let handler = BinHandler.Init (isa, bytes)
    let ins = BinHandler.ParseInstr handler 0UL
    let translation = ins.Translate handler.TranslationContext
    appendFile (sprintf "%A" translation) "result.txt"

    let bytes = [| 0x90uy; 0x00uy; 0x00uy; 0xf0uy |]
    let handler = BinHandler.Init (isa, bytes)
    let ins = BinHandler.ParseInstr handler 0UL
    let translation = ins.Translate handler.TranslationContext
    appendFile (sprintf "%A" translation) "result.txt"

    let bytes = [| 0x11uy; 0xcauy; 0x47uy; 0xf9uy |]
    let handler = BinHandler.Init (isa, bytes)
    let ins = BinHandler.ParseInstr handler 0UL
    let translation = ins.Translate handler.TranslationContext
    appendFile (sprintf "%A" translation) "result.txt"

    let bytes = [| 0x10uy; 0x42uy; 0x3euy; 0x91uy |]
    let handler = BinHandler.Init (isa, bytes)
    let ins = BinHandler.ParseInstr handler 0UL
    let translation = ins.Translate handler.TranslationContext
    appendFile (sprintf "%A" translation) "result.txt"
    .....................................................
    .....................................................
    .....................................................
    let bytes = [| 0x20uy; 0x02uy; 0x1fuy; 0xd6uy |]
    let handler = BinHandler.Init (isa, bytes)
    let ins = BinHandler.ParseInstr handler 0UL
    let translation = ins.Translate handler.TranslationContext
    appendFile (sprintf "%A" translation) "result.txt"
    0

Subsequently, we executed "dotnet run" in the command prompt to generate the "result.txt" file. After conducting our tests, we observed that it takes approximately 9 minutes to obtain "result.txt" for a binary program of size 22k.

Therefore, I would like to ask if B2R2 has pre-existing interfaces that can fulfill the desired functionality, where I input a binary program and receive the desired "result.txt" document containing all LowUIR instructions. If such interfaces are available, how should I use them?

If not, do you have any alternative solutions that might be more effective?

Best Regards!

It seems that you are creating BinHandle for every instruction. We recommend to instantiate it only once.

For more information how to use it, you can see the source of our BinDump tool. When I use our BinDump tool to disassemble C library (/lib/x86_64-linux-gnu/libc.so.6), which contains 400K instructions in its .text section, it takes less than a second to disassemble them all.

First, do the followings

$ cd B2R2/src/RearEnd/BinDump
$ time dotnet run -c Release -- /lib/x86_64-linux-gnu/libc.so.6  -S .text > /dev/null

real  0m2.641s
user  0m2.565s
sys   0m0.295s

Of course, the dotnet command incurs some start-up cost, so you can directly invoke the compiled binary as follows:

$ time ./bin/Release/net6.0/B2R2.RearEnd.BinDump /lib/x86_64-linux-gnu/libc.so.6 -S .text > /dev/null

real  0m0.888s
user  0m0.820s
sys   0m0.068

So on my machine, parsing and disassembling 400K instructions took about 0.888 seconds. Notice that if you write your own code using our API to parse a binary stream, it will be much faster because BinDump is doing other stuffs for parsing ELF headers, etc.

sangkilc commented 1 year ago

As I said earlier, you are instantiating BinHandle over and over again. You can use our BinaryPointer interface to effeciently read through a binary file: https://b2r2.org/APIDoc/reference/b2r2-frontend-binfile-binarypointer.html

Here is an example code snippet that takes in a raw binary file (no ELF, no PE, just a raw sequence of instruction bytes), and outputs lifted instructionsn to console.

open System
open B2R2
open B2R2.FrontEnd.BinInterface
open B2R2.FrontEnd.BinFile

let rec lift hdl errShift bp (addr: Addr) cnt =
  if BinaryPointer.IsValid bp then
    match BinHandle.TryParseInstr (hdl, bp=bp) with
    | Ok (ins) ->
      BinHandle.LiftInstr hdl ins
      |> BinIR.LowUIR.Pp.stmtsToString
      |> Console.WriteLine
      let len = ins.Length
      let bp' = BinaryPointer.Advance bp (int len)
      lift hdl errShift bp' (addr + uint64 len) (cnt + 1)
    | Error _ ->
      let bp' = BinaryPointer.Advance bp errShift
      lift hdl errShift bp' (addr + uint64 errShift) cnt
  else cnt

[<EntryPoint>]
let main argv =
  let filePath = argv[0] // raw binary file path
  let isa = ISA.Init Architecture.IntelX64 Endian.Little
  let hdl =
    BinHandle.Init (isa,
                    ArchOperationMode.NoMode,
                    false,
                    None,
                    fileName=filePath)
  let bp = hdl.FileInfo.ToBinaryPointer hdl.FileInfo.BaseAddress
  let errShift = 1
  let cnt = lift hdl errShift bp 0UL 0
  0
yechenn commented 1 year ago

OK! Following your giving codes, I have sovled this. Like this:

open System
open B2R2
open B2R2.FrontEnd.BinInterface
open B2R2.FrontEnd.BinFile
open System.IO

let appendFile (content: string) (filePath: string) =
  File.AppendAllText(filePath, content)

let rec lift hdl errShift bp (addr: Addr) cnt =
  if BinaryPointer.IsValid bp then
    match BinHandle.TryParseInstr (hdl, bp=bp) with
    | Ok (ins) ->
      let translation = ins.Translate hdl.TranslationContext
      appendFile (sprintf "%A" translation) "1.txt"
      let len = ins.Length
      let bp' = BinaryPointer.Advance bp (int len)
      lift hdl errShift bp' (addr + uint64 len) (cnt + 1)
    | Error _ ->
      let bp' = BinaryPointer.Advance bp errShift
      lift hdl errShift bp' (addr + uint64 errShift) cnt
  else cnt

[<EntryPoint>]
let main argv =
  let filePath = "C:\\Windows\\hh.exe" // raw binary file path
  let isa = ISA.Init Architecture.IntelX64 Endian.Little
  let hdl =
    BinHandle.Init (isa,
                    ArchOperationMode.NoMode,
                    false,
                    None,
                    fileName=filePath)
  let bp = hdl.FileInfo.ToBinaryPointer hdl.FileInfo.BaseAddress
  let errShift = 1
  let cnt = lift hdl errShift bp 0UL 0
  0
sangkilc commented 1 year ago

(sprintf "%A" translation) This should be avoided because it is too slow. See my code above.

sangkilc commented 1 year ago

Also running AppendAllText for every iteration doesn't make sense because you are opening and closing the file over and over again.

yechenn commented 1 year ago

Also running AppendAllText for every iteration doesn't make sense because you are opening and closing the file over and over again.

OK! Thanks for your suggestion.