Closed yechenn closed 1 year ago
It seems that you are creating BinHandle
for every instruction. We recommend to instantiate it only once.
For more information how to use it, you can see the source of our BinDump
tool. When I use our BinDump
tool to disassemble C library (/lib/x86_64-linux-gnu/libc.so.6), which contains 400K instructions in its .text section, it takes less than a second to disassemble them all.
First, do the followings
$ cd B2R2/src/RearEnd/BinDump
$ time dotnet run -c Release -- /lib/x86_64-linux-gnu/libc.so.6 -S .text > /dev/null
real 0m2.641s
user 0m2.565s
sys 0m0.295s
Of course, the dotnet
command incurs some start-up cost, so you can directly invoke the compiled binary as follows:
$ time ./bin/Release/net6.0/B2R2.RearEnd.BinDump /lib/x86_64-linux-gnu/libc.so.6 -S .text > /dev/null
real 0m0.888s
user 0m0.820s
sys 0m0.068
So on my machine, parsing and disassembling 400K instructions took about 0.888 seconds. Notice that if you write your own code using our API to parse a binary stream, it will be much faster because BinDump
is doing other stuffs for parsing ELF headers, etc.
We have tested the example you provided, but we found that it only reads the instructions from the executable file and outputs the disassembled opcodes. Here is the example and its output:
The cmd code is:
C:\Program Files\dotnet\B2R2\src\RearEnd\BinDump> dotnet run -c Release -- C:\Windows\hh.exe -S .text >C:\Users\YC\Desktop\keyan\iotmalware\1.txt
The output is :
[C:\Windows\hh.exe]
# (.text)
0000000140001000: CC int3
0000000140001001: CC int3
0000000140001002: CC int3
0000000140001003: CC int3
0000000140001004: CC int3
0000000140001005: CC int3
0000000140001006: CC int3
0000000140001007: CC int3
0000000140001008: 4C 89 44 24 18 mov qword ptr [RSP+0x18], R8
000000014000100d: 4C 89 4C 24 20 mov qword ptr [RSP+0x20], R9
0000000140001012: 53 push RBX
0000000140001013: 56 push RSI
0000000140001014: 57 push RDI
0000000140001015: 48 83 EC 20 sub RSP, 0x20
0000000140001019: 33 FF xor EDI, EDI
000000014000101b: 48 8D 42 FF lea RAX, qword ptr [RDX-0x1]
000000014000101f: 48 3D FE FF FF 7F cmp RAX, 0x7ffffffe
0000000140001025: 48 8B F1 mov RSI, RCX
0000000140001028: B9 57 00 07 80 mov ECX, 0x80070057
000000014000102d: 0F 47 F9 cmova EDI, ECX
0000000140001030: 85 FF test EDI, EDI
0000000140001032: 78 3B js +0x3d ; 0x14000106f
0000000140001034: 48 8D 5A FF lea RBX, qword ptr [RDX-0x1]
0000000140001038: 48 8B CE mov RCX, RSI
000000014000103b: 48 8B D3 mov RDX, RBX
000000014000103e: 4C 8D 4C 24 58 lea R9, qword ptr [RSP+0x58]
0000000140001043: 33 FF xor EDI, EDI
0000000140001045: 48 FF 15 24 22 00 00 call qword ptr [RIP+0x2224] ; <_vsnprintf>
000000014000104c: 0F 1F 44 00 00 nop dword ptr [RAX+RAX+0x0]
0000000140001051: 85 C0 test EAX, EAX
0000000140001053: 78 0F js +0x11 ; 0x140001064
0000000140001055: 48 98 cdqe
0000000140001057: 48 3B C3 cmp RAX, RBX
Our objective is to convert each instruction of a binary program into LowUIR and produce a txt file that encompasses all instructions of the binary program in LowUIR format.
Following your example provided on GitHub, we initially utilized Python to extract the instructions from the binary program and generated "Program.fs". Like this:
open B2R2
open B2R2.FrontEnd
open System.IO
let appendFile (content: string) (filePath: string) = File.AppendAllText(filePath, content)
[<EntryPoint>]
let main argv =
let isa = ISA.OfString "aarch64"
let bytes = [| 0xf0uy; 0x7buy; 0xbfuy; 0xa9uy |]
let handler = BinHandler.Init (isa, bytes)
let ins = BinHandler.ParseInstr handler 0UL
let translation = ins.Translate handler.TranslationContext
appendFile (sprintf "%A" translation) "result.txt"
let bytes = [| 0x90uy; 0x00uy; 0x00uy; 0xf0uy |]
let handler = BinHandler.Init (isa, bytes)
let ins = BinHandler.ParseInstr handler 0UL
let translation = ins.Translate handler.TranslationContext
appendFile (sprintf "%A" translation) "result.txt"
let bytes = [| 0x11uy; 0xcauy; 0x47uy; 0xf9uy |]
let handler = BinHandler.Init (isa, bytes)
let ins = BinHandler.ParseInstr handler 0UL
let translation = ins.Translate handler.TranslationContext
appendFile (sprintf "%A" translation) "result.txt"
let bytes = [| 0x10uy; 0x42uy; 0x3euy; 0x91uy |]
let handler = BinHandler.Init (isa, bytes)
let ins = BinHandler.ParseInstr handler 0UL
let translation = ins.Translate handler.TranslationContext
appendFile (sprintf "%A" translation) "result.txt"
.....................................................
.....................................................
.....................................................
let bytes = [| 0x20uy; 0x02uy; 0x1fuy; 0xd6uy |]
let handler = BinHandler.Init (isa, bytes)
let ins = BinHandler.ParseInstr handler 0UL
let translation = ins.Translate handler.TranslationContext
appendFile (sprintf "%A" translation) "result.txt"
0
Subsequently, we executed "dotnet run" in the command prompt to generate the "result.txt" file. After conducting our tests, we observed that it takes approximately 9 minutes to obtain "result.txt" for a binary program of size 22k.
Therefore, I would like to ask if B2R2 has pre-existing interfaces that can fulfill the desired functionality, where I input a binary program and receive the desired "result.txt" document containing all LowUIR instructions. If such interfaces are available, how should I use them?
If not, do you have any alternative solutions that might be more effective?
Best Regards!
It seems that you are creating
BinHandle
for every instruction. We recommend to instantiate it only once.For more information how to use it, you can see the source of our
BinDump
tool. When I use ourBinDump
tool to disassemble C library (/lib/x86_64-linux-gnu/libc.so.6), which contains 400K instructions in its .text section, it takes less than a second to disassemble them all.First, do the followings
$ cd B2R2/src/RearEnd/BinDump $ time dotnet run -c Release -- /lib/x86_64-linux-gnu/libc.so.6 -S .text > /dev/null real 0m2.641s user 0m2.565s sys 0m0.295s
Of course, the
dotnet
command incurs some start-up cost, so you can directly invoke the compiled binary as follows:$ time ./bin/Release/net6.0/B2R2.RearEnd.BinDump /lib/x86_64-linux-gnu/libc.so.6 -S .text > /dev/null real 0m0.888s user 0m0.820s sys 0m0.068
So on my machine, parsing and disassembling 400K instructions took about 0.888 seconds. Notice that if you write your own code using our API to parse a binary stream, it will be much faster because
BinDump
is doing other stuffs for parsing ELF headers, etc.
As I said earlier, you are instantiating BinHandle over and over again. You can use our BinaryPointer interface to effeciently read through a binary file: https://b2r2.org/APIDoc/reference/b2r2-frontend-binfile-binarypointer.html
Here is an example code snippet that takes in a raw binary file (no ELF, no PE, just a raw sequence of instruction bytes), and outputs lifted instructionsn to console.
open System
open B2R2
open B2R2.FrontEnd.BinInterface
open B2R2.FrontEnd.BinFile
let rec lift hdl errShift bp (addr: Addr) cnt =
if BinaryPointer.IsValid bp then
match BinHandle.TryParseInstr (hdl, bp=bp) with
| Ok (ins) ->
BinHandle.LiftInstr hdl ins
|> BinIR.LowUIR.Pp.stmtsToString
|> Console.WriteLine
let len = ins.Length
let bp' = BinaryPointer.Advance bp (int len)
lift hdl errShift bp' (addr + uint64 len) (cnt + 1)
| Error _ ->
let bp' = BinaryPointer.Advance bp errShift
lift hdl errShift bp' (addr + uint64 errShift) cnt
else cnt
[<EntryPoint>]
let main argv =
let filePath = argv[0] // raw binary file path
let isa = ISA.Init Architecture.IntelX64 Endian.Little
let hdl =
BinHandle.Init (isa,
ArchOperationMode.NoMode,
false,
None,
fileName=filePath)
let bp = hdl.FileInfo.ToBinaryPointer hdl.FileInfo.BaseAddress
let errShift = 1
let cnt = lift hdl errShift bp 0UL 0
0
OK! Following your giving codes, I have sovled this. Like this:
open System
open B2R2
open B2R2.FrontEnd.BinInterface
open B2R2.FrontEnd.BinFile
open System.IO
let appendFile (content: string) (filePath: string) =
File.AppendAllText(filePath, content)
let rec lift hdl errShift bp (addr: Addr) cnt =
if BinaryPointer.IsValid bp then
match BinHandle.TryParseInstr (hdl, bp=bp) with
| Ok (ins) ->
let translation = ins.Translate hdl.TranslationContext
appendFile (sprintf "%A" translation) "1.txt"
let len = ins.Length
let bp' = BinaryPointer.Advance bp (int len)
lift hdl errShift bp' (addr + uint64 len) (cnt + 1)
| Error _ ->
let bp' = BinaryPointer.Advance bp errShift
lift hdl errShift bp' (addr + uint64 errShift) cnt
else cnt
[<EntryPoint>]
let main argv =
let filePath = "C:\\Windows\\hh.exe" // raw binary file path
let isa = ISA.Init Architecture.IntelX64 Endian.Little
let hdl =
BinHandle.Init (isa,
ArchOperationMode.NoMode,
false,
None,
fileName=filePath)
let bp = hdl.FileInfo.ToBinaryPointer hdl.FileInfo.BaseAddress
let errShift = 1
let cnt = lift hdl errShift bp 0UL 0
0
(sprintf "%A" translation)
This should be avoided because it is too slow. See my code above.
Also running AppendAllText
for every iteration doesn't make sense because you are opening and closing the file over and over again.
Also running
AppendAllText
for every iteration doesn't make sense because you are opening and closing the file over and over again.
OK! Thanks for your suggestion.
I have deployed B2R2 successfully and used the following example to process instructions and output their IR.
However, I have observed that it takes around three to four seconds to process each instruction.
Considering that an average ELF file typically contains tens of thousands of instructions, and if we need to process hundreds of thousands of such ELF files, the computational time required becomes substantial.
Even with the utilization of multi-threading, it remains difficult to effectively address this issue.
Therefore, I would like to inquire if there are any recommended methods to solve this problem.