NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.55k stars 5.78k forks source link

Wildcarded Pattern Generation Feature #5663

Open pinwhell opened 1 year ago

pinwhell commented 1 year ago

a lot of time will be saved when making code signature patterns that works!.

it would be very very cool to somehow kinda have a feature that automaticly make such patterns, with the ability to wildcard opcodes related to IMM/MEMDISP, or even instruciton, making the pattern very robust and effective.

i have done my own tool, using capstone, but it would be even more amazing if we had something like this withing GHIDRA, that kinda work for each Arch

i see this feature kinda complex, i see like a simple way of it, and a complex way of it.

  1. Simple way, i can describe it, as a simple way of not to much microscopic precission when whildcarding stuffs, for example

lets consider this instruction 00 00 90 E5 => ldr r0, [r0, #0x0]

lets consider increasing a very high memory displacement

FF 0F 90 E5 => ldr r0, [r0, #0xFFF]

when i say simple, i mean not going to precisely do a bitmask wildcard for it, but simply, whildcarding the bytes itself that changes, we can clearly see that byte 0x0 & 0x1, changed, meaning that the feature in simple way should be expected to output

? ? 90 E5

for this given instruction, as you can see, this simple way, lack of microscopic precision, becouse:

  1. may wildcard things we are not interested in whildcarding, just like the destination register "r0"

on the other side, there could be a precisely surgical wildcarding at a bitlevel, this way we have more control in what we want to wildcard, for example just Memory Disp, Immediates or even Registers.

Thanks guys!

ghidracadabra commented 1 year ago

Have you seen the "Instruction Pattern Search" feature? From the Code Browser, Search -> For Instruction Patterns.

pinwhell commented 1 year ago

yes i am aware of it, but i just saw the possibility to find such well masked patterns, but i didn't saw the possibility to automatically create one, perfectly masked, is already there such feature of creating the perfectly masked pattern? am i missing something?

ghidracadabra commented 1 year ago

You might take a look at YaraGhidraGUIScript.java, but I don't think that's what you're asking for.

If I understand correctly, you would like to have better control over the masking, to be able to do things like accept a set of registers or a range of addresses for a given operand, instead of either masking out an operand completely or fixing all of the bits. It's not a bad idea, and it might fit with some of our planned work, so I'll put the "Future" tag on this ticket.

pinwhell commented 1 year ago

More precisely, the concept I was describing in my original request involves an automated way of generating patterns, allowing for different levels of control:

Fine-grained Control: This would involve masking or wild-carding at the bit level. For instance, a pattern like "AA B? CC ?D" or "AA 00 | 10101010 10101010" could be created. This method provides more precision but can be complex.

Coarse-grained Control: This would entail masking or wild-carding at the byte level. For example, a pattern like "AA ?? CC ??" could be generated. This approach is simpler but provides less granularity.

Both of these wild-carding methods serve a purpose. Method 1 is highly precise and allows for intricate pattern specifications, while Method 2 is more straightforward but offers less detailed control.

To illustrate the kind of feature I'm proposing, I've prepared an example and a proof of concept from a tool I've developed:

image

Consider the following byte array:

9C 00 9F E5 00 10 A0 E3 00 00 9F E7 04 10 8D E5 00 00 90 E5 74 10 90 E5 00 00 51 E3

As you can see, treating this array as a robust pattern isn't practical due to potential changes in offsets caused by relocations, image updates, or other factors, with the tool, it automatically recognizes the instruction and apply a wild-carding technique to all the applicable instructions bytes:

0x118e368: ldr    r0, [pc, #0x9c]   {1,1,0,0}
0x118e36c: mov    r1, #0            {1,1,1,0}
0x118e370: ldr    r0, [pc, r0]
0x118e374: str    r1, [sp, #4]      {1,1,0,0}
0x118e378: ldr    r0, [r0]
0x118e37c: ldr    r1, [r0, #0x74]   {1,1,0,0}
0x118e380: cmp    r1, #0            {1,1,0,0}

? ? 9F E5 ? ? ? E3 00 00 9F E7 ? ? 8D E5 00 00 90 E5 ? ? 90 E5 ? ? 51 E3

resulting in a relatively more robust pattern....

but the main thing is, this pattern was generated automatically, based on those set of rules, this is kinda the feature i was proposing, to include in Ghidra, maybe do it at a more core level with micro-instructions, then translate to actual real instructions like x86-ARM ...