NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.19k stars 5.67k forks source link

Word sizes that are smaller or not multiples of 8 bits #100

Open jdb2 opened 5 years ago

jdb2 commented 5 years ago

Is your feature request related to a problem? Please describe. I am familiar with the machine code and reverse engineering of many deeply embedded processors that have word sizes that are not a multiple of 8 bits. To be an actual universal reverse engineering tool, Ghidra needs to support any word size, include 1-bit or 33-bits etc. As of now I can't write a SLEIGH spec for the processors in which I'm interested for machine code reverse engineering.

Describe the solution you'd like Extend the SLEIGH processor definition format to support word sizes that are sub-8-bit or not integer multiples of 8 bits.

Describe alternatives you've considered I've considered "unpacking", say, 4-bit processors, into a 1-nibble per byte representation.

Additional context I'm mainly interested in reverse engineering old calculators or computers whose firmware source code has been lost -- especially peripherals of above systems. I've encountered word sizes ranging for 1 bit, 4 bits and even 56-bits.

jdb2 commented 5 years ago

I know LLVM has something similar, for manipulating integer values at least. For example, one can specify a value as an "i4" integer, that is, a 4-bit nibble. See here .

fincs commented 5 years ago

+1. TeakLite II for instance has 40-bit registers.

ghost commented 5 years ago

This is typically done by rounding sizes to the nearest byte, modeling the processor language accordingly, and padding leading bits of instructions out to a byte boundary for import into Ghidra.

jdb2 commented 5 years ago

One significant example that I forgot to mention is the Microchip PIC12 universe of devices. In the case of the PIC12, the instruction words are 12 bits and the data words are 8-bits. In its current state, it would be very problematic or at least very frustrating to describe the above used processors using Ghidra, which is a shame, because tiny, deeply embedded microcontrollers and DSPs are utilized in a huge amount of devices and they usually have none-standard word sizes which are often not multiples of 8 bits.

jdb2 commented 5 years ago

This is typically done by rounding sizes to the nearest byte, modeling the processor language accordingly, and padding leading bits of instructions out to a byte boundary for import into Ghidra.

Thank you for the informative reply :) Could you go into more detail on how what you suggest can be implemented or share a link to an explanation of such an implementation?

RyanHope commented 3 years ago

M16C has 20bit registers... how am I supposed to handle these in Ghidra?

GhidorahRex commented 3 years ago

@RyanHope We have a pic12 processor spec (mentioned by @jdb2 above), it might be worth checking out that implementation if you're interested in seeing what an off-sized register language looks like.