Closed nsadeveloper789 closed 5 years ago
Will add those in to the update.
Thanks for the offer for the binary. I've copy pasted the others into a test binary so I have those. Raw hex string bytes are the easiest to put in a test binary. It might be useful if the copy/paste into the listing were augmented to strip out the instructions following the bytes.
These NOP instructions are listed in the AMD manual, and are reserved in the Intel manual.
They can be added, but we'll have to monitor for collisions in the future.
@emteere I think the risk with not adding them is that malware could use them as an anti-Ghidra-RE mechanism. Granted, there might be lots of instructions that could be used for this, like hidden instructions found via https://github.com/xoreaxeaxeax/sandsifter (although at least those would hopefully give portability nightmares to the malware developers).
The test binary is attached (tar/gz'd). In addition to the instructions above, my script outputs interesting variants of the Mod/RM, SIB, Displacement, and Immediate fields, which might be helpful for testing. To see what objdump decodes the instructions as, just run objdump -D -b binary -mi386 -Mintel insns.bin
. Note that some of the instructions included in the binary are actually invalid, and some are duplicates - unfortunately I haven't had time to improve the script since I worked on it last summer.
Agree on the malware front, they'll be included. It is a wack-a-mole problem.
The hidden instructions could be added to a .sinc file and turned on for controlled use.
Thanks for the test bytes.
I noticed that GHIDRA also does not recognize the following NOP, which is generated by CLANG compiler.
0x66 0x90h
@jimaf, have you tried it in 9.0.1? It was released today.
@jimaf, have you tried it in 9.0.1? It was released today.
Works for me.
@jimaf, have you tried it in 9.0.1? It was released today.
Works for me.
Nope, does not work for me. May be that context matters. Here is a snippit of a disassembled function from a stripped binary that was compiled as 32-bit position independent code with CLANG.
At address 001cc89 Ghidra does not detect the NOPs that are used as padding after the function. The next function starts at address 001cc90, and it misses that one.
Before someone asks, the crossreference to address 001cc90 is: 00a10855 c7 87 90 MOV dword ptr [DAT_0001cc90 + EDI],0x109
So it uses this address as a base, offsets it with EDI.
I could be wrong, but this issue about "Ghidra can't disassemble NOP instructions" not "recognize" or "detect". You can disassemble NOP instructions?
By the way, using "Condense Filler Bytes (Prototype)" analyzer may be useful for detecting filler bytes between functions or "Aggressive Instruction Finder (Prototype)" analyzer for finding NOP instructions. Also "FunctionBitPatternsExplorerPlugin" is useful for exploring undefined functions.
Aggressive Instruction Finder finds the missing function at 001cc90 (but ignores the NOPs). How do you tell the headless analyzer to run this prototype? Can't see that in the docs.
The Aggressive Instruction Finder "attempts" in a very simplistic way to look at how other functions start in your code. There are many improvements that could be done to it.
Simply finding and marking NOP's is not always the best solution, since those NOP's could actually be data bytes. Many processor's NOP instruction are some number of 0x00's.
I guess the real issue is automatically disassembling the code. Ghidra is generally conservative in where it starts disassembling. In my mind, the NOP's are arbitrary bytes, they could be invalid random bytes if the compiler had been done that way. They really aren't code, and disassemblers that find the code such as objdump are cheating on the fact that they do disassemble without figuring out why they disassemble.
In your case with the offset of the address from EDI as a data reference, the MOV access is an indication that the bytes at address DAT_0001cc90 could be data.
You can select an area and disassemble it. This isn't the suggested method to start, unless you are sure there is no data in the are and it is all instructions. Ghidra will follow flow, as much as it can starting at the first location, then start again at the next undefined location. If this works for you, it is cheap and easy. There are most likely some scripts that do this, but one doesn't come to mind.
Not disassembling an area, can help you understand the program and look for the way a location is actually disassembled or accessed. That said, automating as much as possible without making mistakes is a goal. It's all a matter of false positives / false negatives balanced on your motivation for doing the RE in the first place.
The Reference analyzer does have some heuristics to follow pure pointer references (no read/write) to see if it looks like code. It may be the "looks" like code may need to be relaxed, but that is a delicate balance that may cause bad code disassembly in other areas.
Closing this issue since the missing NOP instructions were added in 9.0.1.
I confirmed that
0f 1f 00
is correctly disassembled into a NOP (for x86)Below is a list of NOP instructions that objdump decodes into NOPs but that Ghidra can't disassemble (the list may not be complete, and tries to only show one instruction per instruction class):
I can share a test binary with these instructions (and the ones from https://github.com/NationalSecurityAgency/ghidra/issues/53#issuecomment-470682674) if that'd be helpful
Originally posted by @recvfrom in https://github.com/NationalSecurityAgency/ghidra/issues/22#issuecomment-472606492