avast / retdec

RetDec is a retargetable machine-code decompiler based on LLVM.
https://retdec.com/
MIT License
7.98k stars 944 forks source link

Decompile of Binary Giving Strange Output #569

Open humanitiesclinic opened 5 years ago

humanitiesclinic commented 5 years ago

cmpdylib.zip ^ This executable came from the Mac, it's a command from /usr/bin/cmpdylib.

When I decompiled it, it seemed to be successful:

Users-MBP:bin user$ python3 retdec-decompiler.py /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib
##### Checking if file is a Mach-O Universal static library...

##### Checking if file is an archive...
RUN: /Users/user/Downloads/retdec/bin/retdec-ar-extractor /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib --arch-magic
Not an archive, going to the next step.

##### Gathering file information...
RUN: /Users/user/Downloads/retdec/bin/retdec-fileinfo -c /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.json --similarity /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib --no-hashes=all --crypto /Users/user/Downloads/retdec/bin/../share/retdec/support/generic/yara_patterns/signsrch/signsrch.yara --crypto /Users/user/Downloads/retdec/bin/../share/retdec/support/generic/yara_patterns/signsrch/signsrch.yarac --max-memory-half-ram
Input file               : /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib
File format              : Mach-O
File class               : 64-bit
File type                : Executable file
Architecture             : x86-64
Endianness               : Little endian
Entry point address      : 0x100000f73
Entry point offset       : 0xf73
Entry point section name : __text
Entry point section index: 0
Bytes on entry point     : 554889e58d47ff488d5608488d3d2900000031c989c6e800000000ff257c0000004c8d1d6d0000004153ff255d0000009068
Detected tool            : gc (compiler), 29 from 160 significant nibbles (18.125%)

##### Trying to unpack /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib into /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib-unpacked.tmp by using generic unpacker...
RUN: /Users/user/Downloads/retdec/bin/retdec-unpacker /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib -o /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib-unpacked.tmp --max-memory-half-ram
No matching plugins found for 'gc'
##### Unpacking by using generic unpacker: nothing to do
##### 'upx' not available: nothing to do

##### Decompiling /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib into /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.bc...
RUN: /Users/user/Downloads/retdec/bin/retdec-bin2llvmir -provider-init -decoder -verify -x87-fpu -main-detection -idioms-libgcc -inst-opt -cond-branch-opt -syscalls -stack -constants -param-return -local-vars -inst-opt -simple-types -generate-dsm -remove-asm-instrs -class-hierarchy -select-fncs -unreachable-funcs -inst-opt -x86-addr-spaces -value-protect -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -inst-opt -simple-types -stack-ptr-op-remove -idioms -global-to-local -dead-global-assign -instcombine -inst-opt -idioms -phi2seq -value-protect -disable-inlining -disable-simplify-libcalls -config-path /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.json -max-memory-half-ram -o /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.bc
Running phase: Initialization ( 0.02s )
Running phase: LLVM ( 0.03s )
Running phase: Providers initialization ( 0.03s )
Running phase: Input binary to LLVM IR decoding ( 0.08s )
Running phase: LLVM ( 0.08s )
Running phase: x87 fpu register analysis ( 0.08s )
Running phase: Main function identification optimization ( 0.08s )
Running phase: Libgcc idioms optimization ( 0.08s )
Running phase: LLVM instruction optimization ( 0.08s )
Running phase: Conditional branch optimization ( 0.08s )
Running phase: Syscalls optimization ( 0.08s )
Running phase: Stack optimization ( 0.08s )
Running phase: Constants optimization ( 0.08s )
Running phase: Function parameters and returns optimization ( 0.08s )
Running phase: Register localization optimization ( 0.08s )
Running phase: LLVM instruction optimization ( 0.08s )
Running phase: Simple types recovery optimization ( 0.08s )
Running phase: Disassembly generation ( 0.08s )
Running phase: Assembly mapping instruction removal ( 0.09s )
Running phase: C++ class hierarchy optimization ( 0.09s )
Running phase: Selected functions optimization ( 0.09s )
Running phase: Unreachable functions optimization ( 0.09s )
Running phase: LLVM instruction optimization ( 0.09s )
Running phase: x86 address spaces optimization ( 0.09s )
Running phase: Value protection optimization ( 0.09s )
Running phase: LLVM ( 0.09s )
Running phase: LLVM instruction optimization ( 0.10s )
Running phase: Simple types recovery optimization ( 0.10s )
Running phase: Stack pointer operations optimization ( 0.10s )
Running phase: Instruction idioms optimization ( 0.10s )
Running phase: Global to local optimization ( 0.10s )
Running phase: Dead global assign optimization ( 0.10s )
Running phase: LLVM ( 0.10s )
Running phase: LLVM instruction optimization ( 0.10s )
Running phase: Instruction idioms optimization ( 0.10s )
Running phase: Phi2Seq optimization ( 0.10s )
Running phase: Value protection optimization ( 0.10s )
Running phase: LLVM ( 0.10s )
Running phase: Bitcode Writer ( 0.10s )
Running phase: Assembly Writer ( 0.10s )
Running phase: Cleanup ( 0.10s )

##### Decompiling /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.bc into /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.c...
RUN: /Users/user/Downloads/retdec/bin/retdec-llvmir2hll -target-hll=c -var-renamer=readable -var-name-gen=fruit -var-name-gen-prefix= -call-info-obtainer=optim -arithm-expr-evaluator=c -validate-module -o /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.c /Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.bc -enable-debug -emit-debug-comments -config-path=/Users/user/Downloads/retdec_output/cmpdylib/cmpdylib.json -max-memory-half-ram
Running phase: initialization ( 0.04s )
 -> creating the used HLL writer [c] ( 0.04s )
 -> creating the used alias analysis [simple] ( 0.04s )
 -> creating the used call info obtainer [optim] ( 0.04s )
 -> creating the used evaluator of arithmetical expressions [c] ( 0.04s )
 -> creating the used variable names generator [fruit] ( 0.04s )
 -> creating the used variable renamer [readable] ( 0.04s )
 -> creating the used semantics [libc,gcc-general,win-api] ( 0.04s )
 -> loading the input config ( 0.04s )
Running phase: conversion of LLVM IR into BIR ( 0.04s )
 -> converting global variables ( 0.05s )
 -> converting function main ( 0.05s )
Running phase: removing functions prefixed with [__decompiler_undefined_function_] ( 0.05s )
Running phase: removing functions from standard libraries ( 0.05s )
Running phase: removing code that is not reachable in a CFG ( 0.05s )
Running phase: signed/unsigned types fixing ( 0.05s )
Running phase: converting LLVM intrinsic functions to standard functions ( 0.05s )
Running phase: obtaining debug information ( 0.05s )
Running phase: alias analysis [simple] ( 0.05s )
Running phase: optimizations [normal] ( 0.05s )
 -> running GotoStmtOptimizer ( 0.05s )
 -> running RemoveUselessCastsOptimizer ( 0.05s )
 -> running UnusedGlobalVarOptimizer ( 0.05s )
 -> running DeadLocalAssignOptimizer ( 0.05s )
 -> running SimpleCopyPropagationOptimizer ( 0.05s )
 -> running CopyPropagationOptimizer ( 0.05s )
 -> running AuxiliaryVariablesOptimizer ( 0.05s )
 -> running SimplifyArithmExprOptimizer ( 0.05s )
 -> running IfStructureOptimizer ( 0.05s )
 -> running LoopLastContinueOptimizer ( 0.05s )
 -> running PreWhileTrueLoopConvOptimizer ( 0.05s )
 -> running WhileTrueToForLoopOptimizer ( 0.05s )
 -> running WhileTrueToWhileCondOptimizer ( 0.05s )
 -> running IfBeforeLoopOptimizer ( 0.05s )
 -> running LLVMIntrinsicsOptimizer ( 0.05s )
 -> running VoidReturnOptimizer ( 0.05s )
 -> running BreakContinueReturnOptimizer ( 0.05s )
 -> running BitShiftOptimizer ( 0.05s )
 -> running DerefAddressOptimizer ( 0.05s )
 -> running EmptyArrayToStringOptimizer ( 0.05s )
 -> running BitOpToLogOpOptimizer ( 0.05s )
 -> running SimplifyArithmExprOptimizer ( 0.05s )
 -> running UnusedGlobalVarOptimizer ( 0.05s )
 -> running DeadLocalAssignOptimizer ( 0.05s )
 -> running SimpleCopyPropagationOptimizer ( 0.05s )
 -> running CopyPropagationOptimizer ( 0.05s )
 -> running SelfAssignOptimizer ( 0.05s )
 -> running VarDefForLoopOptimizer ( 0.05s )
 -> running VarDefStmtOptimizer ( 0.05s )
 -> running EmptyStmtOptimizer ( 0.05s )
 -> running GotoStmtOptimizer ( 0.05s )
 -> running SimplifyArithmExprOptimizer ( 0.05s )
 -> running DeadCodeOptimizer ( 0.05s )
 -> running DerefToArrayIndexOptimizer ( 0.05s )
 -> running IfToSwitchOptimizer ( 0.05s )
 -> running CCastOptimizer ( 0.05s )
 -> running CArrayArgOptimizer ( 0.05s )
Running phase: variable renaming [readable] ( 0.05s )
Running phase: converting constants to symbolic names ( 0.05s )
Running phase: module validation ( 0.05s )
 -> running BreakOutsideLoopValidator ( 0.05s )
 -> running NoGlobalVarDefValidator ( 0.05s )
 -> running ReturnValidator ( 0.05s )
Running phase: emission of the target code [c] ( 0.05s )
Running phase: finalization ( 0.05s )
Running phase: cleanup ( 0.06s )

##### Done!

However, when I opened the .c file, I saw that there was barely any code in the main function:

//
// This file was generated by the Retargetable Decompiler
// Website: https://retdec.com
// Copyright (c) 2019 Retargetable Decompiler <info@retdec.com>
//

#include <stdint.h>

// ------------------------ Functions -------------------------

// Address range: 0x100000f73 - 0x100000f8e
int main(int argc, char ** argv) {
    // 0x100000f73
    return argc - 1;
}

// --------------------- Meta-Information ---------------------

// Detected compiler/packer: gc
// Detected functions: 1
// Decompilation date: 2019-05-03 23:32:35

If you man cmpdylib, the executable is supposed to do much more than that:

NAME
       cmpdylib - compare two dynamic shared libraries for compatibility

SYNOPSIS
       cmpdylib oldLibrary newLibrary

DESCRIPTION
       cmpdylib  compares  two versions of a dynamic shared library to see if
       they are compatible with each other...

Attaching the output of the decompile as well: cmpdylib_retdec_output.zip

Please help me check how come the output doesn't tally with what the binary is expected to do.

silverbacknet commented 5 years ago

Unfortunately, decompiling 64-bit binaries still doesn't work as well as 32-bit. This sample might help further that work, though.

s3rvac commented 5 years ago

Thank you for the report. @PeterMatula, can you please take a look?

humanitiesclinic commented 5 years ago

Oh I see. Was there any indication in the README or the Retdec website that 64-bit binaries have problems, which I have missed?

s3rvac commented 5 years ago

No, you haven't missed anything. It is just that support for decompilation of 64b binaries was added only recently (in the v3.3 release on March 18, which is 1,5 months ago), and we have been working on improving it since then. @PeterMatula will verify the binary you have submitted and will let you know.

humanitiesclinic commented 5 years ago

ok noted, is there any update? @PeterMatula @s3rvac

humanitiesclinic commented 5 years ago

Hello @PeterMatula @s3rvac, has there been any progress made on this issue?

s3rvac commented 5 years ago

Hi @humanitiesclinic. You will have to wait until @PeterMatula takes a look at this issue.

xkubov commented 5 years ago

Hi, I tried to decompile the file that you have provided and I was able to reproduce your issue.

From analyzing disassembly output it is possible to see that there is function stub called from main (also located in main). This function then jumps into (invokes) _xcselect_invoke_xcrun function.

Disassembly of input file:

0x100000f73:   push rbp 
0x100000f74:   mov rbp, rsp
0x100000f77:   lea eax, [rdi - 1]
0x100000f7a:   lea rdx, [rsi + 8]
0x100000f7e:   lea rdi, [rip + 0x29]
0x100000f85:   xor ecx, ecx 
0x100000f87:   mov esi, eax 
0x100000f89:   call 0x100000f8e
0x100000f8e:   jmp qword ptr [rip + 0x7c] <_xcselect_invoke_xcrun>
...

This file seems to be very optimized, nonetheless, retdec should be able to decompile it properly. After decoder passage it can be seen (in LLVM IR) that function stub was successfully transformed into function definition:

define i64 @function_100000f8e() {
dec_label_pc_100000f8e:

; 0x100000f8e
  store volatile i64 4294971278, i64* @_asm_program_counter
  %0 = call i64 @_xcselect_invoke_xcrun()
  store i64 %0, i64* @rax
  ret i64 undef
}

Despite its detection, this function is not called later from the main function and thus it is discarded in later stages of decompilation (it is not used anywhere else too). This is not right because from disassembly it can be seen that this stub function is called in the main function.

I was interested in comparison with other decompilers and when it comes to IDA and Ghidra, their output for this particular file is shown below. When it comes to IDA, it seems to struggle with the same problem as RetDec. The output of Ghidra however seems to be the closest to the real content of the binary file.

As it can be seen from the comment line, Ghidra managed to gain such decompilation by treating jumps in the main function as calls.

IDA

int __cdecl main(int argc, const char **argv, const char **envp)
                 push    rbp
                 mov     rbp, rsp
                 lea     eax, [rdi-1]
                 lea     rdx, [rsi+8]
                 lea     rdi, aCmpdylib  ; "cmpdylib"
                 xor     ecx, ecx
                 mov     esi, eax
                 call    $+5
                 endp ; sp-analysis failed

Ghidra


void entry(int iParm1,long lParm2)

{
                    /* WARNING: Could not recover jumptable at 0x000100000f8e. Too many branches */
                    /* WARNING: Treating indirect jump as call */
  (*(code *)_xcselect_invoke_xcrun)("cmpdylib",(ulong)(iParm1 - 1),lParm2 + 8,0);
  return;
}