Closed frozenkp closed 3 years ago
Hi @frozenkp !
I think that a nice way to do this may be to implement a custom bin_stream
. The bin_stream
is a Miasm object that provide bytes to the highter miasm layers (disassembler, jitter, ...)
So the trick will be to dump the instruction bytes additionally to the instruction name you have.
Once you have this, you may implement your custom bin_stream
which reads those information (address + bytes) to a dictionnary which will feed bytes in your custom bin_stream
.
After this, just launch a disassembler, and boom it should generate the cfg. The trick is that when the disassembler requests bytes from your bin_stream you don't have in your "database", just raise an IOError and it's handle by miasm, which will generate a missing branch in the cfg.
Am I clear @frozenkp? Do you want to have a try at implementing it?
Hi @serpilliere,
Thanks for your clear reply!
So, the bin_stream
object is the one that parses the given shellcode and binary, and dis_engine
and its member function dis_multiblock
are the ones that read bytes from bin_stream
and build the CFG, right?
Therefore, as you said, I can implement a custom bin_stream
with my own dictionary, and everything should be fine since the latter part just asks for bytes from the dictionary.
This solution sounds good and makes sense. I will give it a try. Thank you!
Correct @frozenkp .
For those who may also want to do the same thing, here are some tips!
First, you can copy class bin_stream
here, and other inherited class is not necessary. Be careful. Your class should be declared like class trace_stream(bin_stream)
since miasm checks whether the given stream is an instance of bin_stream
.
Then, you have to modify functions in your bin stream. In my case, I removed the implementation of atomic-related instructions (just replaced with return in those functions), put my parsing routine in __init__
, and modify those get*
functions. Actually, only _getbytes
have to be modified to your own implementation, and other get*
functions are fine. They all refer to get_bytes
in the end.
Finally, you have to add a __len__
function since it would check the size of your binary to prevent overflow. In my case, I give it the max address of my trace because the trace is not always continuous. It will be failed if you give it the size of your dictionary.
Hope these can help anyone who has the same question!
Hi,
I want to do path exploration on windows malware, and it's quite complicated to emulate it since there are packers. API, and obfuscation. What I want to do is using miasm to build CFG from its execution trace dump directly. (not from shellcode or PE) Then, do symbolic execution on it to find a condition to trigger an undiscovered path. After generating a new trace, I will combine the new one with the original one to generate a new CFG, then again.
My execution trace dump is like the following, and I can generate more runtime information if needed:
The trace dump is not continuous since there will be several control transfer instructions, thus, I can't treat it as a shellcode directly. And, because of the unpacked routine, there will be some instructions on dynamically allocated memory.
Is there any possible way to read the execution trace and generate a CFG?