NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.45k stars 5.77k forks source link

Ghidra cannot disassmble many bytes for ARM binaries in Thumb mode #657

Open valour01 opened 5 years ago

valour01 commented 5 years ago

Describe the bug Ghidra does not disassemble many bytes for the test case binaries. For example, at addr 0x8ac9c, there should be code while Ghidra will left the bytes as data and would not disassemble them

To Reproduce Steps to reproduce the behavior: Feed the file to Ghidra. Jump to the address in the screenshot after ghidra finishes the analysis

Expected behavior Assembly language should be listed for these bytes rather than keep them as data with the question mark

Screenshots image

Attachments test_bin.zip

Environment (please complete the following information):

saruman9 commented 5 years ago

IMHO it is not bug. Ghidra can't know current index of array of pointers of functions (also length of array), because it is unknown parameter of function (see Figure 1).

2019-06-05_105451_740882121

Figure 1. Decompile window of target function

As a result, Ghidra can't create functions.

On the figure you can see message about unrecoverable jumptable. You can manually create references (or use script) and functions.

valour01 commented 5 years ago

Thus, for Ghidra, the right disassembly relies on the accurate function detection. It seems that Ghidra does not have very good support on resolving the jump tables. Maybe this is where we can enhance.

saruman9 commented 5 years ago

You can also use Aggressive Instruction Finder and ARM Aggressive Instruction Finder analysers + Function Start Search, Function Start Search After Code, Function Start Search After Data analyzers for finding needed functions (the analyzers partly solve the problem), but I not recommend these analyzers for permanent use (many false positive mistakes).

For resolving the jump tables or indirect jumping Ghidra should use symbolic execution (concolic execution) or something similar. It would be nice if Ghidra will be used this technics, but it is not trivial task.

emteere commented 5 years ago

The switch statement is not recovering for two reasons:

If you modify the BL 0x88BA0 to be a branch instead of a call, the switch statement will recover. You may need to re-analyze the area. This issue is indicative of a larger problem, which is the program is using BL instructions, which are normally used for calls, as long branches.

There are mechanisms that will simulate the switch call and recover all the branches, however the decompiler flow analysis is much better, and can recover why each case is taken.

If you really want to get the code out of this large auto-generated YACC routine, and other routines, then you'll need to do the analysis a bit more carefully.

If you don't fix these, then recovering the switch statement is the least of your issues with the binary. There are many more that need to be fixed. When encountering a binary that has these issues the default analysis will need to be modified so that certain things don't occur based on the incorrect creation of functions from these BL call-long jump instructions.

Right after importing the binary, go to the following addresses and disasemble them as thumb. This is so the non-returning thunk to stack_check_fail can be found before alot of flow damage has to be fixed, wasting time. 0x000118c8 - press F12 to disassemble as Thumb Create a function at 0x000118c8

Then analyze the binary, and:

Turn off Discover Non-Returning Turn off Shared Return analysis Turn off Stack Analysis Turn off ARM constant propagation.

Then analyze the code.

You'll re-analyze later when the code is fixed. Leaving these on will waste time and spray bad references all over because the code is mal-formed.

You can then run one of two scripts, Fix_ARM_Call_JumpsScript which will go through all the calls and attempt to figure out the correct flow, changing BL's that are really long jumps to branches.

Unfortunately this doesn't work totally on your massive routine. Select the addresses between 0x7ff5c through 0x89150 with Select->Bytes...->ToAddress. I would also turn this into a Hilight so you can easily get back the selection, as selection is brittle. Clear all the functions from within the selection, except the top one with Edit->ClearWithOptions... Then run Override_ARM_Call_JumpsScript. This will force any BL's with destinations within the selected region to a Branch flow. You may need to do this repeatedly as more code is found, and you may need to analyze within the area so that the switches will recover. Normally this is an automated process, but the newly found code will have bad flowing BL's in it.

We could automate some of this, for instance the clearing of the existence of the functions in the override area. We've planned to come up with a better automated solution for this issue, but automating this bad of flow can be tricky, and can cause more hidden damage if not done correctly.

Once you're sure you've cleaned up the above BL issues, you can re-run auto-analysis and turn on ARM Constant Propagation. I'd still be careful turning back on Shared Return and Non-Returning, because you may discover more code that has bad flowing BL's.

The FixupNoReturn script can be used to hand choose non-returning functions.

pwmoore commented 4 years ago

I was actually just going to open a new issue, but found this one searching, so it may be worth it to just append here. Let me know if you'd rather me open a new one.

I've noticed the same behavior on both x86_64 and AArch64 Mach-O files: large switch statements (like those generated by yacc/bison) fail to disassemble correctly, and as such, the decompilation is all messed up. I unfortunately cannot share the binary I"m working on, so I've been trying to create a test case that I can share.

Let me know if you'd like me to open a separate issue for this.

pabx06 commented 4 years ago

I had many case where bl inst from thumb mode are treated like call to no return function. Witch is wrong. And the following bytes are treated like data instead of being disassembled. ida does the job better at it. And using "function search after code analyzer" does in many case disassemble a lot of data. So i wonder what to do ? Mark data ? Then disassemble ?

TheGag96 commented 2 years ago

When I add overlays to the file / Memory Map, I encounter this. "Aggressive Instruction Finder" and "ARM Aggressive Instruction Finder" ends up not disassembling most of the added binary. I have to go manually tell Ghidra to disassemble the binary by hitting F12 then F to add functions, repeating a very large number of times...

shinespeciall commented 2 years ago

anything new about this issue?