cea-sec / miasm

Reverse engineering framework in Python
https://miasm.re/
GNU General Public License v2.0
3.49k stars 475 forks source link

Does miasm support build CFG and symbolic execution on execution trace? #1385

Closed frozenkp closed 3 years ago

frozenkp commented 3 years ago

Hi,

I want to do path exploration on windows malware, and it's quite complicated to emulate it since there are packers. API, and obfuscation. What I want to do is using miasm to build CFG from its execution trace dump directly. (not from shellcode or PE) Then, do symbolic execution on it to find a condition to trigger an undiscovered path. After generating a new trace, I will combine the new one with the original one to generate a new CFG, then again.

My execution trace dump is like the following, and I can generate more runtime information if needed:

0x402460:push ebp
0x402461:mov ebp, esp
0x402463:sub esp, 0xc
0x402466:mov dword ptr [ebp-0x4], 0x0
0x40246d:mov dword ptr [ebp-0x8], 0x0
0x402474:cmp dword ptr [ebp-0x4], 0x0
0x402478:jz 0x4024a0
0x4024a0:call 0x402300
0x402300:push ebp
0x402301:mov ebp, esp
0x402303:sub esp, 0x2c
0x402306:mov dword ptr [ebp-0x18], 0x1
0x40230d:mov dword ptr [ebp-0x20], 0x1
0x402314:mov dword ptr [ebp-0x2c], 0x1
0x40231b:mov dword ptr [ebp-0x8], 0x1
0x402322:mov dword ptr [ebp-0x14], 0x1
0x402329:mov dword ptr [ebp-0x1c], 0x1
0x402330:mov dword ptr [ebp-0x28], 0x1

The trace dump is not continuous since there will be several control transfer instructions, thus, I can't treat it as a shellcode directly. And, because of the unpacked routine, there will be some instructions on dynamically allocated memory.

Is there any possible way to read the execution trace and generate a CFG?

serpilliere commented 3 years ago

Hi @frozenkp !

I think that a nice way to do this may be to implement a custom bin_stream. The bin_stream is a Miasm object that provide bytes to the highter miasm layers (disassembler, jitter, ...)

So the trick will be to dump the instruction bytes additionally to the instruction name you have. Once you have this, you may implement your custom bin_stream which reads those information (address + bytes) to a dictionnary which will feed bytes in your custom bin_stream.

After this, just launch a disassembler, and boom it should generate the cfg. The trick is that when the disassembler requests bytes from your bin_stream you don't have in your "database", just raise an IOError and it's handle by miasm, which will generate a missing branch in the cfg.

Am I clear @frozenkp? Do you want to have a try at implementing it?

frozenkp commented 3 years ago

Hi @serpilliere,

Thanks for your clear reply!

So, the bin_stream object is the one that parses the given shellcode and binary, and dis_engine and its member function dis_multiblock are the ones that read bytes from bin_stream and build the CFG, right?

Therefore, as you said, I can implement a custom bin_stream with my own dictionary, and everything should be fine since the latter part just asks for bytes from the dictionary.

This solution sounds good and makes sense. I will give it a try. Thank you!

serpilliere commented 3 years ago

Correct @frozenkp .

frozenkp commented 3 years ago

For those who may also want to do the same thing, here are some tips!

First, you can copy class bin_stream here, and other inherited class is not necessary. Be careful. Your class should be declared like class trace_stream(bin_stream) since miasm checks whether the given stream is an instance of bin_stream.

Then, you have to modify functions in your bin stream. In my case, I removed the implementation of atomic-related instructions (just replaced with return in those functions), put my parsing routine in __init__, and modify those get* functions. Actually, only _getbytes have to be modified to your own implementation, and other get* functions are fine. They all refer to get_bytes in the end.

Finally, you have to add a __len__ function since it would check the size of your binary to prevent overflow. In my case, I give it the max address of my trace because the trace is not always continuous. It will be failed if you give it the size of your dictionary.

Hope these can help anyone who has the same question!