ahmedbougacha / dagger

Binary Translator to LLVM IR
Other
216 stars 51 forks source link

Entrypoint in ELF-Files #9

Closed 0o52 closed 7 years ago

0o52 commented 7 years ago

For ELF-Files it is useless to use the entrypoint specified in the ELF-Header. Usually it does only runtime-initialisation and then call __libc_start_main which does not return for dagger (it seems it jumps to an address in a register after the execution of the function) - so all output from dagger is the same for every ELF-File. It would be better to use the main-function if the file was not stripped and for all other cases the user should specify the entrypoint manually until we find a stable way to automatically find it with dagger.

0o52 commented 7 years ago

I am a bit confused about the github-interface here - I am new here, but i think i updated the PullRequest to the structure you want. I will try to understand how the ELF-init-Function works and how we get the address for the main()-method reliable with a static approach and update the gatherEntrypoints-Function.

ahmedbougacha commented 7 years ago

I am a bit confused about the github-interface here - I am new here, but i think i updated the PullRequest to the structure you want.

Yeah, this looks fine, thanks!

0o52 commented 7 years ago

With the changes above the command option TranslationEntrypoint was broken. With the new commit from today (87433fc) I moved this command option from llvm-dec.cpp to MCObjectSymbolizer.cpp.

ahmedbougacha commented 7 years ago

With the changes above the command option TranslationEntrypoint was broken.

Huh, how?

I committed a slightly tweaked version of the previous patch in 8227d6ba9, with a small testcase added.

That does include the fallback to the entrypoint: the symbolizer is also used by llvm-mccfg, which isn't impacted by the crt code and should always be able to disassemble the binary. I still think it's unfortunate that it will cause confusing dagger output. Maybe we could special-case this somehow in the DC/translation code?

ahmedbougacha commented 7 years ago

Oh and btw, thanks for the patch, and sorry it slipped through the cracks! Let me know if this is sufficient for you

0o52 commented 7 years ago
  if (MainEntrypoint.hasValue() == false) {
    // FIXME: Find the main entrypoint in a stripped ELF-File if possible.
   // The Entrypoint specified in the ELF-Header is not always useful, because
    // it calls __libc_start_main and does not return in a way we could detect
    // it. So the goal is to identify the start of the main()-function here.
    // FIXME: We only handle 64bit LE ELF.
    if (auto *EF = dyn_cast<ELF64LEObjectFile>(&OF))
      MainEntrypoint = EF->getELFFile()->getHeader()->e_entry;
    else
      report_fatal_error("Found stripped ELF file, could not find entrypoint.");
  }

With the code of your commit above report_fatal_error("Found stripped ELF file, could not find entrypoint."); will never be executed, because every ELF-File has an Entrypoint. But the Point is, that this Entrypoint is (in my opinion) useless for an analyse with dagger because the analyse for all ELF-Files without an "main"-Symbol will be the same (see above).

But if you throw out

    if (auto *EF = dyn_cast<ELF64LEObjectFile>(&OF))
      MainEntrypoint = EF->getELFFile()->getHeader()->e_entry;

in your code Dagger invokes on stripped binaries the report_fatal_error("Found stripped ELF file, could not find entrypoint."); before you use the custom entrypoint given by the user in llvm-dec.cpp.

ahmedbougacha commented 7 years ago
  if (MainEntrypoint.hasValue() == false) {
    // FIXME: Find the main entrypoint in a stripped ELF-File if possible.
    // The Entrypoint specified in the ELF-Header is not always useful, because
    // it calls __libc_start_main and does not return in a way we could detect
    // it. So the goal is to identify the start of the main()-function here.
    // FIXME: We only handle 64bit LE ELF.
    if (auto *EF = dyn_cast<ELF64LEObjectFile>(&OF))
      MainEntrypoint = EF->getELFFile()->getHeader()->e_entry;
    else
      report_fatal_error("Found stripped ELF file, could not find entrypoint.");
  }

With the code of your commit above report_fatal_error("Found stripped ELF file, could not find entrypoint."); will never be executed, because every ELF-File has an Entrypoint.

That should catch things like big-endian or 32-bit ELF (that's unsupported elsewhere too)

But the Point is, that this Entrypoint is (in my opinion) useless for an analyse with dagger because the analyse for all ELF-Files without an "main"-Symbol will be the same (see above).

But if you throw out

    if (auto *EF = dyn_cast<ELF64LEObjectFile>(&OF))
      MainEntrypoint = EF->getELFFile()->getHeader()->e_entry;

in your code Dagger invokes on stripped binaries the report_fatal_error("Found stripped ELF file, could not find entrypoint."); before you use the custom entrypoint given by the user in llvm-dec.cpp.

Again, I completely agree. But the MC CFG / MCAnalysis bits are useful for more than DC translation. For instance, I'm working on merging the custom symbolization logic in llvm-objdump with MCObjectSymbolizer. I don't think we should limit these kinds of clients because DC is unable to cope with these functions.

Now, there's a point to be made that the DC/translation bits to complain about stripped files. I'll think about where best to put that, but I don't think MC* is the right place. Does that sound reasonable?

0o52 commented 7 years ago

I am afraid that putting this into DC is not a good idea because you have to add mutch specific knowledge about specific functions for specifc OS (for a specific version) and specific Plattform into DC. This will add unnecessary complexity into it and makes it harder to understand. Maybe it would be better to ignore this topic .. or maybe we could add a simple note if it run with the debug-flag.