NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.78k stars 5.72k forks source link

RISCV uncompleted decompilation #4765

Open snx90 opened 1 year ago

snx90 commented 1 year ago

The decompiler output of some functions in RISCV ELFs is uncompleted, only showing return instructions and function calls.

Attached an screenshot from one specific function, but I am getting similar decompiled code with other functions in the same ELF.

This has been tested with Ghidra 10.2.2 public release (Java version 19.0.1), running on macOS Big Sur 11.7 for Apple Silicon.

Screen Shot 2022-11-21 at 11 34 17 PM
ghidracadabra commented 1 year ago

It is possible that the signature and/or calling convention of some_func need to be adjusted. The decompiler applies various heuristics to determine these attributes when it needs to (basically, if they have not been set already by the user or set with high confidence by an analyzer) .

From your screenshot, it appears that the decompiler has assigned a return type of void to some_func. If some_func is not actually void, then the decompiler might be eliminating code which contributes to the return value. You can right-click on some_func in the decompiler window to modify the return type, parameters, and calling convention.

You might also have to set this information for FUN_0400eff0.

Another possibility is "custom storage", i.e., a function is using a calling convention that Ghidra doesn't know about. In this case you can set the parameter and return locations manually.

I see several warnings in the decompiler window that unreachable code has been eliminated. For debugging decompilation, it sometimes helps to turn this off. This can be done in the code browser via Edit -> Tool Options -> Decompiler -> Analysis -> Eliminate unreachable code.

mumbel commented 1 year ago

Are all the functions that look incomplete just CSR operations?

snx90 commented 1 year ago

Hi, thanks for your feedback.

The comments from @ghidracadabra seem to be quite accurate, the tool is having a hard time to identify the calling convention that needs to be used (to be honest, the code itself does not help, I have reviewed the assembly code of the problematic functions and it is hard to find who/how/when the input arguments are set by the caller). I am trying fix this manually, following the advice of @ghidracadabra.

@mumbel from what I have seen, most (I cannot say all as I have not checked all of them) of the problematic functions read/write to CSRs. Yet, I have the feeling that this is more a coincidence than the actual cause of the decompilation issue.

ghidracadabra commented 1 year ago

@mumbel might be on to something - it's possible that the RISCV module should be modified to treat the CSR registers as global variables. If my original suggestions don't fix the issue let me know.

snx90 commented 1 year ago

Sure. For the moment I have tried disabling the "eliminate unreachable code" option, but it does not help. I have noticed that these problematic functions are no-return functions (they are executed as part of a boot chain) but the tool does not seem to be very happy about that. Actually, if I mark those functions as no-return, Ghidra removes the calls to them (and any related code). Is there any way to avoid that behavior for no-return functions?

ghidracadabra commented 1 year ago

The no-return attribute is intended for functions the compiler knows do not return, such as exit or abort. The bytes immediately after a call to a non-returning function should not be disassembled unless there is additional evidence that they are executable (such as being a jump target). It is common, but not certain, to see disassembly errors after calls to functions that should be marked as non-returning but aren't.

If you mark a function as non-returning, Ghidra will attempt to undo any disassembly that wouldn't have occurred if it knew the function were non-returning. This can have cascading effects and cause significant changes to the program and might explain what you are seeing.

If you are certain that these functions were treated by the compiler as non-returning, you can try importing the program again and marking them as non-returning manually before performing auto analysis. In general, it's best to recognize non-returning functions as early as possible.

A few other suggestions:

  1. Do you see any error bookmarks (click the purple checkmark icon in the code browser and enter "Error" in the filter)? These can indicate problems with non-returning functions or other issues that need to be addressed.
  2. Do you see large sections of the program that weren't disassembled or marked as data?
  3. The script CompareFunctionSizesScript.java will decompile all functions and present a table where you can compare the number of assembly instructions in a function to the size of the decompiled code. This might help you determine which functions have too much code being eliminated and whether they have anything in common.
emteere commented 1 year ago

There were some changes to the RISCV language on 12/2/2022 that are not in 10.2.2. It is possible that may have fixed some of your issues.

One change no longer disassembles the 0x00000000 bytes as an UNIMPL instruction with a fallthru. These bytes are used to pad between functions, and could cause flow analysis issues such as making functions non-returning when they do return and other decompilation issues.

If you could give the changes a try and see if that helps with your binary.

emteere commented 1 year ago

10.2.3 fixed a spacebase and global tag mismatched which caused the decompiler to not produce any results sometimes if the GP isn't set.

I have changes to the RISCV analyzer that will locate GP settings, created a symbol, and place a bookmark at the setting locations. It can also once it finds one, start using it as the global GP.

Unfortunately this can lead to errors if there are multiple GP settings as in your code. I've found two.

If one or the other isn't found, you can use the wrong one, without knowing where the GP values are valid. If you don't disassemble from the first address in the text section, which the analyzer doesn't know to do safely, it won't find the first value. If you disassemble from the top and then do analysis, during the first disassembly, it may not have found the second value. If you've found more than one value you can't assume either one.

It could be done as two passes. If a GP is found, the first pass wouldn't use it, then the second pass would apply it and re-analyze with the single GP. If there are more than one, then it wouldn't apply either, but put out a message that there are two GP's as part of analysis.

I'm thinking of just finding the GP, and then having a setting that can be turned on/off to apply it as a second pass if only one GP value is found. It all comes down to automation, and what you assume, and if you have found all possible GP values. At the very least we can locate and markup GP settings, which is what the code already does. It applies them too, but not in two phases. Any thoughts?