avast / retdec

RetDec is a retargetable machine-code decompiler based on LLVM.
https://retdec.com/
MIT License
8.05k stars 952 forks source link

Can't use C++ library #920

Open FredyR4zox opened 3 years ago

FredyR4zox commented 3 years ago

Hi, thank you for making this project available to the community.

I have a problem when using the new retdec C++ library and the LLVM framework. If i use both of them separate, they work just fine. The problem is when the two are in the same program.

Using libraries: Retdec: https://github.com/avast/retdec/commit/6ed327e30fd2bbd45767ff45eae7cfd63fdfc2f1 LLVM: https://github.com/llvm/llvm-project/releases/tag/llvmorg-11.0.1

I compiled both of them and installed system wide.

I copied the example program in the retdec blog post: https://engineering.avast.io/retdec-v4-0-is-out/ - "4. retdec library" And i copied the LLVM example program Fibonacci: https://github.com/llvm/llvm-project/tree/main/llvm/examples/Fibonacci This is my final code:

lift_jit_pass.cpp

#include <llvm/ADT/APInt.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/ExecutionEngine/GenericValue.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/IR/Argument.h>
#include <llvm/IR/BasicBlock.h>
#include <llvm/IR/Constants.h>
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/Function.h>
#include <llvm/IR/InstrTypes.h>
#include <llvm/IR/Instructions.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Module.h>
#include <llvm/IR/Type.h>
#include <llvm/IR/Verifier.h>
#include <llvm/Support/Casting.h>
#include <llvm/Support/TargetSelect.h>
#include <llvm/Support/raw_ostream.h>
#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <memory>
#include <retdec/llvm/Support/raw_ostream.h>
#include <retdec/retdec/retdec.h>
#include <string>
#include <vector>

using namespace llvm;

// Function in the LLVM Fibonacci example
static Function *CreateFibFunction(Module *M, LLVMContext &Context) {
  // Create the fib function and insert it into module M. This function is said
  // to return an int and take an int parameter.
  FunctionType *FibFTy = FunctionType::get(Type::getInt32Ty(Context),
                                           {Type::getInt32Ty(Context)}, false);
  Function *FibF =
      Function::Create(FibFTy, Function::ExternalLinkage, "fib", M);

  // Add a basic block to the function.
  BasicBlock *BB = BasicBlock::Create(Context, "EntryBlock", FibF);

  // Get pointers to the constants.
  Value *One = ConstantInt::get(Type::getInt32Ty(Context), 1);
  Value *Two = ConstantInt::get(Type::getInt32Ty(Context), 2);

  // Get pointer to the integer argument of the add1 function...
  Argument *ArgX = &*FibF->arg_begin(); // Get the arg.
  ArgX->setName("AnArg");            // Give it a nice symbolic name for fun.

  // Create the true_block.
  BasicBlock *RetBB = BasicBlock::Create(Context, "return", FibF);
  // Create an exit block.
  BasicBlock* RecurseBB = BasicBlock::Create(Context, "recurse", FibF);

  // Create the "if (arg <= 2) goto exitbb"
  Value *CondInst = new ICmpInst(*BB, ICmpInst::ICMP_SLE, ArgX, Two, "cond");
  BranchInst::Create(RetBB, RecurseBB, CondInst, BB);

  // Create: ret int 1
  ReturnInst::Create(Context, One, RetBB);

  // create fib(x-1)
  Value *Sub = BinaryOperator::CreateSub(ArgX, One, "arg", RecurseBB);
  CallInst *CallFibX1 = CallInst::Create(FibF, Sub, "fibx1", RecurseBB);
  CallFibX1->setTailCall();

  // create fib(x-2)
  Sub = BinaryOperator::CreateSub(ArgX, Two, "arg", RecurseBB);
  CallInst *CallFibX2 = CallInst::Create(FibF, Sub, "fibx2", RecurseBB);
  CallFibX2->setTailCall();

  // fib(x-1)+fib(x-2)
  Value *Sum = BinaryOperator::CreateAdd(CallFibX1, CallFibX2,
                                         "addresult", RecurseBB);

  // Create the return instruction and add it to the basic block
  ReturnInst::Create(Context, Sum, RecurseBB);

  return FibF;
}

int main(int argc, char *argv[]) {
  // retdec example program

  if (argc != 2) {
    llvm::errs() << "Expecting path to input\n";
    return 1;
  }
  std::string input = argv[1];

  retdec::common::FunctionSet fs;
  retdec::LlvmModuleContextPair llvm = retdec::disassemble(input, &fs);

  // Dump entire LLVM IR module.
  llvm::outs() << *llvm.module;

  // Dump functions, basic blocks, instructions.
  for (auto &f : fs) {
    llvm::outs() << f.getName() << " @ ";
    std::cout << f << "\n";
    for (auto &bb : f.basicBlocks) {
      llvm::outs() << "\t"
                   << "bb @ ";
      std::cout << bb << "\n";
      // These are not only text entries.
      // There is a full Capstone instruction.
      for (auto *i : bb.instructions) {
        llvm::outs() << "\t\t" << retdec::common::Address(i->address) << ": "
                     << i->mnemonic << " " << i->op_str << "\n";
      }
    }
  }

  // LLVM Fibonacci example

  int n = argc > 1 ? atol(argv[1]) : 24;

  InitializeNativeTarget();
  InitializeNativeTargetAsmPrinter();
  LLVMContext Context;

  // Create some module to put our function into it.
  std::unique_ptr<Module> Owner(new Module("test", Context));
  Module *M = Owner.get();

  // We are about to create the "fib" function:
  Function *FibF = CreateFibFunction(M, Context);

  // Now we going to create JIT
  std::string errStr;
  ExecutionEngine *EE =
    EngineBuilder(std::move(Owner))
    .setErrorStr(&errStr)
    .create();

  if (!EE) {
    errs() << argv[0] << ": Failed to construct ExecutionEngine: " << errStr
           << "\n";
    return 1;
  }

  errs() << "verifying... ";
  if (verifyModule(*M)) {
    errs() << argv[0] << ": Error constructing function!\n";
    return 1;
  }

  errs() << "OK\n";
  errs() << "We just constructed this LLVM module:\n\n---------\n" << *M;
  errs() << "---------\nstarting fibonacci(" << n << ") with JIT...\n";

  // Call the Fibonacci function with argument n:
  std::vector<GenericValue> Args(1);
  Args[0].IntVal = APInt(32, n);
  GenericValue GV = EE->runFunction(FibF, Args);

  // import result of execution
  outs() << "Result: " << GV.IntVal << "\n";

  return 0;
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.13.4)
project(lift_jit_pass)

find_package(LLVM REQUIRED CONFIG)
message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")

find_package(retdec REQUIRED 
   COMPONENTS 
      retdec
      llvm
)
message(STATUS "Found retdec ${retdec_PACKAGE_VERSION}")
message(STATUS "Using retdecConfig.cmake in: ${retdec_DIR}")

# Set your project compile flags.
# E.g. if using the C++ header files
# you will need to enable C++11 support
# for your compiler.

set(CMAKE_CXX_STANDARD 20)

include_directories(${LLVM_INCLUDE_DIRS})
add_definitions(${LLVM_DEFINITIONS})
message(STATUS "retdec_INCLUDE_DIRS ${retdec_INCLUDE_DIRS}")
message(STATUS "retdec_DEFINITIONS ${retdec_DEFINITIONS}")
message(STATUS "LLVM_INCLUDE_DIRS ${LLVM_INCLUDE_DIRS}")
message(STATUS "LLVM_DEFINITIONS ${LLVM_DEFINITIONS}")

add_compile_options(-no-pie)

# Now build our tools
add_executable(lift_jit_pass lift_jit_pass.cpp)

# Link against LLVM and retdec libraries
target_link_libraries(lift_jit_pass 
   retdec::retdec
   retdec::deps::llvm
)
target_link_libraries(lift_jit_pass LLVM)

Problem when running:

~/.../Thesis/build >>> rm -rf * && cmake ../ && make && ./lift_jit_pass a.out
zsh: sure you want to delete all 5 files in /home/REDACTED/Builds/Thesis/build [yn]? y
-- The C compiler identification is GNU 10.2.0
-- The CXX compiler identification is GNU 10.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found LLVM 11.0.1
-- Using LLVMConfig.cmake in: /usr/local/lib/cmake/llvm
-- Found OpenSSL: /usr/lib/libcrypto.so (found suitable version "1.1.1i", minimum required is "1.0.1")  
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found ZLIB: /usr/lib/libz.so (found version "1.2.11") 
-- Found retdec 
-- Using retdecConfig.cmake in: /usr/local/share/retdec/cmake
-- retdec_INCLUDE_DIRS 
-- retdec_DEFINITIONS 
-- LLVM_INCLUDE_DIRS /usr/local/include
-- LLVM_DEFINITIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-- Configuring done
-- Generating done
-- Build files have been written to: /home/REDACTED/Builds/Thesis/build
Scanning dependencies of target lift_jit_pass
[ 50%] Building CXX object CMakeFiles/lift_jit_pass.dir/lift_jit_pass.cpp.o
[100%] Linking CXX executable lift_jit_pass
[100%] Built target lift_jit_pass
: for the -W option: cl::alias must have argument name specified!
realloc(): invalid pointer
zsh: abort (core dumped)  ./lift_jit_pass a.out

Does anyone know why is this happening? Maybe the LLVM versions (LLVM 11 and the custom LLVM for retdec) are in conflict.

Another question: I know that i can't compile the program again to the binary form and that means that i can't JIT it using LLVM right? Only make manual analysis. Is this correct? I have read other issues in github that state this (the re-compile part).

PS. (not related): My objective is to make a tool or framework to deobfuscate obfuscated binaries by lifting them and apply deobfuscation techniques to it.

PeterMatula commented 3 years ago

The most likely problem is, that you are combining RetDec and vanilla LLVM. At the moment, we use modified LLVM 8.0.0 that comes with RetDec. If you use a different LLVM version, or vanilla (unmodified) correct version, it probably won't end up well.

  1. Try to use only one LLVM - the one that comes with RetDec. Use CMake similar to the example retdectool. As you can see, there are some ugly extra compiler options. Maybe you won't need them (try it without them at first), but if there is some problem, adding them might help - they have something to do with LLVM being huge hard to link.

If that won't work either, let me know, I can try to reproduce it more thoroughly.

If you succeed, great. Now, do you really need vanilla LLVM, or can you make do with slightly modified LLVM 8? We can discuss this more when it works.

The ideal would be if RetDec could use vanilla LLVM, but we are not there yet.

PeterMatula commented 3 years ago

Another question: I know that i can't compile the program again to the binary form and that means that i can't JIT it using LLVM right? Only make manual analysis. Is this correct? I have read other issues in github that state this (the re-compile part).

Well, you could try, but because we don't aim at this (missing complex semantics) and we use only static analysis (hard to get everything right), and there is a bunch of bugs and inaccuracies, I don't think you would get anything that could really fully run again.

However, you don't necessarily have to do only manual analysis. After you decode the binary, you can pipe LLVM IR to your LLVM pass at any point - i.e. at any point after retdec-decoder pass, you have a valid LLVM IR module to work with. You don't have to run all the analyses, only the ones that you want, and you can then run any custom analysis you write. It's just that I think the IR quality is not enough to recompile it. Analysis passes that automatically inspect the IR in order to make sense of it are feasible. Also, if you find some problems, we can make it better.

FredyR4zox commented 3 years ago

Try to use only one LLVM - the one that comes with RetDec. Use CMake similar to the example retdectool. As you can see, there are some ugly extra compiler options. Maybe you won't need them (try it without them at first), but if there is some problem, adding them might help - they have something to do with LLVM being huge hard to link.

I've tried to only use the LLVM in retdec and it doesnt work. While executing, it gives back and error. The error is very similar to the one i get in the issue. I haven't tried to compile it with the retdectool flags, will try.

However, you don't necessarily have to do only manual analysis. After you decode the binary, you can pipe LLVM IR to your LLVM pass at any point - i.e. at any point after retdec-decoder pass, you have a valid LLVM IR module to work with. You don't have to run all the analyses, only the ones that you want, and you can then run any custom analysis you write.

Nice! That is what i was looking for.

It's just that I think the IR quality is not enough to recompile it. Analysis passes that automatically inspect the IR in order to make sense of it are feasible. Also, if you find some problems, we can make it better.

The IR quality is not enough to recompile it, but all valid IR isn't compilable? I would want to make analysis and transform the IR. Is this feasible?

PeterMatula commented 3 years ago

I've tried to only use the LLVM in retdec and it doesnt work. While executing, it gives back and error. The error is very similar to the one i get in the issue. I haven't tried to compile it with the retdectool flags, will try.

I will try to use it outside of RetDec repo and investigate what is needed in order to work.

PeterMatula commented 3 years ago

The IR quality is not enough to recompile it, but all valid IR isn't compilable? I would want to make analysis and transform the IR. Is this feasible?

It should be valid, therefore it should be compilable. But it may not (and probably won't) have exactly the same functionality as the original - some things are simplified, omitted, etc. As an entire program, this is unlikely to work. On a function level, it might be possible.