Default considering the external function call, which has no implementation in binary, as always return function may cause problem in data-flow analysis

Muqi-Zou commented 3 weeks ago

Describe the bug I noticed Ghidra default considers the external function call as an always return function when the implementation of this external function call is not in the decompiled binary. However, whether the external function call is a never-return function will affect the SSA construction of this decompiled function, which may cause a bug during this function's dataflow analysis.

For instance, I now have two binaries compiled from the same source code. One is compiled with -O2, and another is compiled with -O2 -fno-cprop-register. However, Ghidra gives me two different results of the function bufferevent_ssl_get_flags: As you can see, one extra line is added in O2 binary before function event_errx(). To understand why, let me first show their disassembled codes, O2 is on the left side: Besides the jumping address, the only two differences between these two assemble codes are: rdi+0x1c0 is replaced with rbx+0x1c0 and rdi+0x250 is replaced with rbx+0x250. Since the event_errx is an external call function without implementation, Ghidra considers it an always-return function. Hence, when Ghidra constructs the dominance frontiers and places the phi node of the rdi+0x250 in heritage.cc, two varnodes ( rdi+0x1c0 and mov edi, 0xdeaddead ) are linked. This causes the varnode representing mov edi, 0xdeaddead to be considered as a local variable rather than a temporary variable.

However, as the event_errx is a never-return function, the varnode representing mov edi, 0xdeaddead should not be placed as the phi node of rdi+0x250. The flag cprop-registers is used to eliminate redundant usage of registers. As you can see from the unoptimized binary, rbx is used instead of rdi, which indicates before optimization, varnode of mov edi, 0xdeaddead should not be related to varnode of rdi+0x250.

I confirm this guess by setting event_errx as a never-return function, and that extra line disappears.

I suggest Ghidra add a comment line within that extra line saying this line of code is added because Ghidra assumes event_errx is an always-return function. Otherwise, users cannot be aware of semantics errors caused by the above situation.

Muqi-Zou commented 3 weeks ago

I attached testing binaries and the diff patch about how I debug and confirm ( changes content in FlowInfo::setupCallSpecs) the problem. debug.zip

ghidra1 commented 3 weeks ago

Have you set event_errx as a non-returning function? If no, this can be done via the Edit Function action and can be done for both internal and external functions. It is not unexpected that you may need to do this explicitly since it is not be defined in the standard list of known non-returning functions (e.g., ElfFunctionsThatDoNotReturn) or analysis was unable to determine.

ghidra1 commented 3 weeks ago

Once you mark event_errx as non-returning this is what the above function looks like in the decompiler:

Muqi-Zou commented 3 weeks ago

Thanks! This option makes codes look good. One more question, does ghidra guess or plan to implement any algorithm to guess? For example, in this scenario, external function calls with names containing "error" and at the end of the function are more likely to be a non-return function than those calls with names containing "get_response" and at the beginning part of the function. After guessing, we could add a flag in both varnode structure and multiequl operation strcuture to indicate one node in this multiequal operation is affected by the possible non-return external function call, and print a comment to indicate this line of operation/variable is affected by the type of external function.

emteere commented 3 weeks ago

Ghidra does have algorithms that try to figure out what should be non-returning based on evidence around calls to functions that would indicate they don't return. Unfortunately there isn't always solid evidence. If there is only a few number of calls to the function in question then there will not be enough "evidence" to automatically change a called function to non-returning.

There might be other data flow indicators that aren't currently being checked, such as use of a value in a register that should have been trashed by a call without setting a new value, indicated by extra_out variables in the decompiler. Also another good indicator is the stack going bad on one of two paths.

There is a script you can run that can identify single non-returning functions that you an run FixupNoReturnFunctionsSript. This is especially useful if you have turned off the "Non-Returning Functions - Discovered" analyzer. It will suggest functions that might be non-returning and you can choose to make them non-returning after evaluating calls to the suspect functions.

The FixupNoReturnFunctionsScript could have those functions identified as you mention, functions with error in the name or other keywords. We definitely couldn't automatically assume that they don't return.

There are improvements in the works to the non-returning algorithm to make fewer mistakes that a function doesn't return. One of the checks the new algorithm does is check for indications that a function does return, such as no other path to the code below the call. This could also be additional evidence that a function with "error" in the names doesn't return. If there is a jump to the code after the call from another point, then you could make a guess that the "error" function doesn't return.

Muqi-Zou commented 3 weeks ago

I see. Thanks for the responses! They are really helpful!!! I will close this issue unless there are further questions about it.

NationalSecurityAgency / ghidra

Default considering the external function call, which has no implementation in binary, as always return function may cause problem in data-flow analysis #6618