bsdphk / PyReveng3

Python based Code Reverse Engineering tool -- Take 3
Other
50 stars 9 forks source link

PyReveng3 is a toolkit for reverse engineering and analysing binary programs, or for that matter any binary data, for computer archaeologic investigations.

Computers used to be pretty strange, and some of the fundamental assumptions modern reverse-engineering tools make, notably "memory is a linear array of bytes", makes them useless for historic computers.

PyReveng3 approaches all such issues with as much generality as possible, to handle any weird computer architecture I have ever encountered.

Presently this generality extends to:

Another important idea has been to make it easy to add a new disassembler, without having to deal with a lot of binary arithmetic, by entering the instruction descriptions as they are typically found in manuals::

PUSH    r2      |0 1 0 1 0| reg |
PUSH    sr      |0 0 0|sr |1 1 0|
POP     W,ea    |1 0 0 0 1 1 1 1|mod|0 0 0| rm  |
POP     r2      |0 1 0 1 1| reg |
POP     sr      |0 0 0|sr |1 1 1|
XCHG    r,ea    |1 0 0 0 0 1 1|w|mod| reg | rm  |
XCHG    W,a,r2  |1 0 0 1 0| reg |
NOP     -       |1 0 0 1 0 0 0 0|

It is important to stress here, that disassemblers are not just for CPUs, they can also be used to analyze interpreted code instructions (like CHIP-8), graphical primitives and other "strange languages".

The fundamental strategy is to build data structures representing the analysis, available for further programatic spelunking, rather than just a textual representation where the structure is flattened.

A good, but complex example, of this is the HP8568B/example.py, where the original language was "Wheelgol" (http://www.hp9825.com/html/hybrid_microprocessor.html) with a calling convention quite different from modern languages.

Of course, dumping the textual representation in the shape of a listing is one of the most typical "further programatic spelunkings" one can do, but it is not limited to only that.

The listing.py module produces something akin to an assembler listing, supporting annotations in the form of block comments, line-comments, labels and ranges, and full control over formatting of both addresses and data.

The project contains a number of examples which I have deemed both sufficiently obsolete, obscure and out-dated to be covered by the "fair use" doctrine, if you disagree please let me know.

Should you happen to have access to the original source code for any of the examples, I would love to receive a copy, even if I cannot publish it.

Disassemblers and examples using them