NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.78k stars 5.72k forks source link

Decompiler failure: Incorrect calling convention, likely trash, it's complicated #5992

Open edmcman opened 8 months ago

edmcman commented 8 months ago

Describe the bug The decompiler overzealously applies dead code.

To Reproduce

  1. Open attached executable. Don't load the PDB.
  2. Decompile 0x411770
  3. Obtain
    
    /* WARNING: Function: __RTC_CheckEsp replaced with injection: __RTC_CheckEsp */

void * FUN_00411770(uint param_1)

{ int iVar1; undefined4 puVar2; undefined4 local_d0 [49]; void local_c;

puVar2 = local_d0; for (iVar1 = 0x33; iVar1 != 0; iVar1 += -1) { *puVar2 = 0xcccccccc; puVar2 = puVar2 + 1; } thunk_FUN_00411650(); if ((param_1 & 1) != 0) { operator_delete(local_c); } return local_c; }

4. Observe that `local_c` is not defined.
5. Observe in the below disassembly image that `local_c` at 0x4117a3 should probably be live.

**Expected behavior**
I expected the decompilation to contain `return param_1;`, or `local_c = param_1;`.

**Screenshots**
![image](https://github.com/NationalSecurityAgency/ghidra/assets/1017189/77d9bdde-33fd-4148-bfa0-3cc3544a74a9)

**Attachments**
(don't use the pdb, it's just there for reference)
[ooex0.zip](https://github.com/NationalSecurityAgency/ghidra/files/13477070/ooex0.zip)

**Environment (please complete the following information):**
 - OS: Ubuntu 22.04.3 LTS
 - Java Version: Zulu 17.0.6
 - Ghidra Version: 10.4
 - Ghidra Origin: Official GitHub Distro

**Additional context**
Some debugger debug messages:

~/g/g/G/F/D/s/d/cpp $ SLEIGHHOME=/home/ed/ghidra/ghidra_10.4_PUBLIC/ ./decomp_dbg 10:14:25 [decomp]> restore /home/ed/Dropbox/temp/2023-11-27 - 411770.xml Decoding ERROR: Unable to open xml document /home/ed/Dropbox/temp/2023-11-27 [decomp]> restore "/home/ed/Dropbox/temp/2023-11-27 - 411770.xml" Decoding ERROR: Unable to open xml document "/home/ed/Dropbox/temp/2023-11-27 [decomp]> restore /tmp/411770.xml /tmp/411770.xml successfully loaded: Intel/AMD 32-bit x86 [decomp]> load function 0x411770 Execution error: Unknown function name: 0x411770 [decomp]> load function FUN_00411770 Function FUN_00411770: 0x00411770 [decomp]> trace address 0x411793 OK (1 ranges) [decomp]> decompile Decompiling FUN_00411770 DEBUG 0: heritage 0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(free) + #0xfffffff8 0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8 0x00411793:2e: u0x00007a80(0x00411793:2e) = (ram,u0x00001d00(free)) 0x00411793:2e: u0x00007a80(0x00411793:2e) = (ram,u0x00001d00(0x00411793:2d)) 0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(free) 0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e)

DEBUG 1: deadcode 0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8 0x00411793:2d: * 0x00411793:2e: u0x00007a80(0x00411793:2e) = (ram,u0x00001d00(0x00411793:2d)) 0x00411793:2e: 0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e) 0x00411793:2f:

I am in OPACTION_DEBUG.

rule_debug.

after entries.

Scope FUN_00411770 param_1 : s0x00000004:4 uint : all flags = 8. after s.

Decompilation complete [decomp]> trace address 0x4117a3 OK (2 ranges) [decomp]> decompile Clearing old decompilation Decompiling FUN_00411770 DEBUG 0: start 0x00411793:2d: 0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(free) + #0xfffffff8 0x00411793:2e: 0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(free)) 0x00411793:2f: 0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(free) 0x004117a3:40: 0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(free) + #0xfffffff8 0x004117a3:41: * 0x004117a3:41: u0x00007a80(0x004117a3:41) = (ram,u0x00001d00(free)) 0x004117a3:42: ** 0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(free)

DEBUG 1: heritage 0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(free) + #0xfffffff8 0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8 0x00411793:2e: u0x00007a80(0x00411793:2e) = (ram,u0x00001d00(free)) 0x00411793:2e: u0x00007a80(0x00411793:2e) = (ram,u0x00001d00(0x00411793:2d)) 0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(free) 0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e) 0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(free) + #0xfffffff8 0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(0x00411771:3) + #0xfffffff8 0x004117a3:41: u0x00007a80(0x004117a3:41) = (ram,u0x00001d00(free)) 0x004117a3:41: u0x00007a80(0x004117a3:41) = (ram,u0x00001d00(0x004117a3:40)) 0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(free) 0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(0x004117a3:41)

DEBUG 2: deadcode 0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8 0x00411793:2d: * 0x00411793:2e: u0x00007a80(0x00411793:2e) = (ram,u0x00001d00(0x00411793:2d)) 0x00411793:2e: 0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e) 0x00411793:2f:

DEBUG 3: propagatecopy 0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(0x00411771:3) + #0xfffffff8 0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(0x00411770:1) + #0xfffffff8

DEBUG 4: addmultcollapse 0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(0x00411770:1) + #0xfffffff8 0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(i) + #0xfffffff4

DEBUG 5: earlyremoval 0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(0x004117a3:41) 0x004117a3:42: **

DEBUG 6: loadvarnode 0x004117a3:41: u0x00007a80(0x004117a3:41) = *(ram,u0x00001d00(0x004117a3:40)) 0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(free)

DEBUG 7: heritage 0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(free) 0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(0x00411796:d2)

DEBUG 8: deadcode 0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(i) + #0xfffffff4 0x004117a3:40: **

DEBUG 9: earlyremoval 0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(0x00411796:d2) 0x004117a3:41: **

I am in OPACTION_DEBUG.

rule_debug.

after entries.

Scope FUN_00411770 param_1 : s0x00000004:4 uint : all flags = 8. after s.

Decompilation complete

edmcman commented 8 months ago

I think perhaps this is the root cause:

DEBUG 28: likelytrash
0x00411796:d2: s0xfffffff4(0x00411796:d2) = ECX(i) [] i0x00411796:32:8(free)
   0x00411796:d2: s0xfffffff4(0x00411796:d2) = [create] i0x00411796:32:8(free)
edmcman commented 8 months ago

So likelytrash seems to ignore inputs not compatible with the calling convention. So I suppose the root problem is that we don't detect the right calling convention (thiscall).

dev747368 commented 8 months ago
[decomp]> restore /home/ed/Dropbox/temp/2023-11-27 - 411770.xml
Decoding ERROR: Unable to open xml document /home/ed/Dropbox/temp/2023-11-27
[decomp]> restore "/home/ed/Dropbox/temp/2023-11-27 - 411770.xml"
Decoding ERROR: Unable to open xml document "/home/ed/Dropbox/temp/2023-11-27
[decomp]> restore /tmp/411770.xml

Human: Please open file. Computer: no Human: Please open "file" Computer: no lots of swearing, cp file /tmp/file Human: Please open /tmp/file Computer: sure :smile:

edmcman commented 8 months ago

https://sourcegraph.com/github.com/NationalSecurityAgency/ghidra@c225fac124821d789b12e4c76f4a0f6eabe83544/-/blob/Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.hh?L818:7

/// Register locations called \b likely \b trash are read as a side-effect of some instruction /// the compiler was using. The canonical example in x86 code is the /// PUSH ECX /// which compilers use to create space on the stack without caring about what's in ECX. /// Even though the decompiler can see that the read ECX value is never getting used directly /// by the function, because the value is getting copied to the stack, the decompiler frequently /// can't tell if the value has been aliased across sub-function calls. By marking the ECX register /// as \b likely \ trash the decompiler will assume that, unless there is a direct read of the /// incoming ECX, none of subfunctions alias the stack location where ECX was stored. This /// allows the spurious references to the register to be removed.

So I think what happens if this:

So, it's kind of bad luck that ECX is likely trash for stdcall, but is also used for thiscall. This means that we'll have trouble detecting thiscall functions when the only real evidence of the thiscall function is in a callee. The end?

But something about this also doesn't sit right with me. We know that something is wrong (well, probably) because we see local_c read from without being defined. So maybe "unless there is a direct read of the incoming ECX" should be changed to "unless there is a direct read of the incoming ECX OR the stack location where it was stored"? This could have some false positives wrt. popping ecx in the function epilogue... but I suspect such pops would be optimized away anyway since they aren't (usually) returned from the function?

edmcman commented 8 months ago

Human: Please open file. Computer: no Human: Please open "file" Computer: no lots of swearing, cp file /tmp/file Human: Please open /tmp/file Computer: sure 😄

New issue: supporting filenames with spaces in decomp_dbg :rofl: