NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.59k stars 5.87k forks source link

Missing NOP instructions in x86_64 #197

Closed nsadeveloper789 closed 5 years ago

nsadeveloper789 commented 5 years ago

I confirmed that 0f 1f 00 is correctly disassembled into a NOP (for x86)

Below is a list of NOP instructions that objdump decodes into NOPs but that Ghidra can't disassemble (the list may not be complete, and tries to only show one instruction per instruction class):

0f 18 20                nop/reserved BYTE PTR [eax]
0f 18 28                nop/reserved BYTE PTR [eax]
0f 18 30                nop/reserved BYTE PTR [eax]
0f 18 38                nop/reserved BYTE PTR [eax]
0f 19 00                nop    DWORD PTR [eax]
0f 19 c0                nop    eax
0f 1a c0                nop    eax
0f 1b c0                nop    eax
0f 1c 00                nop    DWORD PTR [eax]
0f 1c 08                nop    DWORD PTR [eax]
0f 1c 10                nop    DWORD PTR [eax]
0f 1c 18                nop    DWORD PTR [eax]
0f 1c 20                nop    DWORD PTR [eax]
0f 1c 28                nop    DWORD PTR [eax]
0f 1c 30                nop    DWORD PTR [eax]
0f 1c 38                nop    DWORD PTR [eax]
0f 1c c0                nop    eax
0f 1d 00                nop    DWORD PTR [eax]
0f 1d c0                nop    eax
0f 1e 00                nop    DWORD PTR [eax]
0f 1e c0                nop    eax
66 0f 1e c0             nop    ax
f2 0f 1c 00             repnz nop DWORD PTR [eax]
f2 0f 1e c0             nop    eax
f3 0f 1b c0             nop    eax
f3 0f 1c 00             repz nop DWORD PTR [eax]
f3 0f 1e c0             nop    eax
f3 0f 1e d0             nop    eax
f3 0f 1e d8             nop    eax
f3 0f 1e e0             nop    eax
f3 0f 1e e8             nop    eax
f3 0f 1e f0             nop    eax
f3 0f 1e f8             nop    eax
f3 0f 1e f9             nop    ecx
f3 0f 1e fc             nop    esp
f3 0f 1e fd             nop    ebp
f3 0f 1e fe             nop    esi
f3 0f 1e ff             nop    edi

I can share a test binary with these instructions (and the ones from https://github.com/NationalSecurityAgency/ghidra/issues/53#issuecomment-470682674) if that'd be helpful

Originally posted by @recvfrom in https://github.com/NationalSecurityAgency/ghidra/issues/22#issuecomment-472606492

emteere commented 5 years ago

Will add those in to the update.

Thanks for the offer for the binary. I've copy pasted the others into a test binary so I have those. Raw hex string bytes are the easiest to put in a test binary. It might be useful if the copy/paste into the listing were augmented to strip out the instructions following the bytes.

emteere commented 5 years ago

These NOP instructions are listed in the AMD manual, and are reserved in the Intel manual.

They can be added, but we'll have to monitor for collisions in the future.

recvfrom commented 5 years ago

@emteere I think the risk with not adding them is that malware could use them as an anti-Ghidra-RE mechanism. Granted, there might be lots of instructions that could be used for this, like hidden instructions found via https://github.com/xoreaxeaxeax/sandsifter (although at least those would hopefully give portability nightmares to the malware developers).

The test binary is attached (tar/gz'd). In addition to the instructions above, my script outputs interesting variants of the Mod/RM, SIB, Displacement, and Immediate fields, which might be helpful for testing. To see what objdump decodes the instructions as, just run objdump -D -b binary -mi386 -Mintel insns.bin. Note that some of the instructions included in the binary are actually invalid, and some are duplicates - unfortunately I haven't had time to improve the script since I worked on it last summer.

insns.tar.gz

emteere commented 5 years ago

Agree on the malware front, they'll be included. It is a wack-a-mole problem.

The hidden instructions could be added to a .sinc file and turned on for controlled use.

Thanks for the test bytes.

jimaf commented 5 years ago

I noticed that GHIDRA also does not recognize the following NOP, which is generated by CLANG compiler.

0x66 0x90h

ryanmkurtz commented 5 years ago

@jimaf, have you tried it in 9.0.1? It was released today.

saruman9 commented 5 years ago

@jimaf, have you tried it in 9.0.1? It was released today.

Works for me. 2019-03-26_221428_969196420

jimaf commented 5 years ago

@jimaf, have you tried it in 9.0.1? It was released today.

Works for me.

Nope, does not work for me. May be that context matters. Here is a snippit of a disassembled function from a stripped binary that was compiled as 32-bit position independent code with CLANG.

At address 001cc89 Ghidra does not detect the NOPs that are used as padding after the function. The next function starts at address 001cc90, and it misses that one.

Before someone asks, the crossreference to address 001cc90 is: 00a10855 c7 87 90 MOV dword ptr [DAT_0001cc90 + EDI],0x109

So it uses this address as a base, offsets it with EDI.

tmplst1.txt

saruman9 commented 5 years ago

I could be wrong, but this issue about "Ghidra can't disassemble NOP instructions" not "recognize" or "detect". You can disassemble NOP instructions?

By the way, using "Condense Filler Bytes (Prototype)" analyzer may be useful for detecting filler bytes between functions or "Aggressive Instruction Finder (Prototype)" analyzer for finding NOP instructions. Also "FunctionBitPatternsExplorerPlugin" is useful for exploring undefined functions.

jjcybersecurity commented 5 years ago

Aggressive Instruction Finder finds the missing function at 001cc90 (but ignores the NOPs). How do you tell the headless analyzer to run this prototype? Can't see that in the docs.

emteere commented 5 years ago

The Aggressive Instruction Finder "attempts" in a very simplistic way to look at how other functions start in your code. There are many improvements that could be done to it.

Simply finding and marking NOP's is not always the best solution, since those NOP's could actually be data bytes. Many processor's NOP instruction are some number of 0x00's.

I guess the real issue is automatically disassembling the code. Ghidra is generally conservative in where it starts disassembling. In my mind, the NOP's are arbitrary bytes, they could be invalid random bytes if the compiler had been done that way. They really aren't code, and disassemblers that find the code such as objdump are cheating on the fact that they do disassemble without figuring out why they disassemble.

In your case with the offset of the address from EDI as a data reference, the MOV access is an indication that the bytes at address DAT_0001cc90 could be data.

You can select an area and disassemble it. This isn't the suggested method to start, unless you are sure there is no data in the are and it is all instructions. Ghidra will follow flow, as much as it can starting at the first location, then start again at the next undefined location. If this works for you, it is cheap and easy. There are most likely some scripts that do this, but one doesn't come to mind.

Not disassembling an area, can help you understand the program and look for the way a location is actually disassembled or accessed. That said, automating as much as possible without making mistakes is a goal. It's all a matter of false positives / false negatives balanced on your motivation for doing the RE in the first place.

The Reference analyzer does have some heuristics to follow pure pointer references (no read/write) to see if it looks like code. It may be the "looks" like code may need to be relaxed, but that is a delicate balance that may cause bad code disassembly in other areas.

ryanmkurtz commented 5 years ago

Closing this issue since the missing NOP instructions were added in 9.0.1.