das-labor / panopticon

A libre cross-platform disassembler.
https://panopticon.re
GNU General Public License v3.0
1.43k stars 78 forks source link

main not discovered for non-pie (exec) ELF binaries #255

Open m4b opened 7 years ago

m4b commented 7 years ago

It seems for ELF non-pie binaries with a standard C runtime, main isn't discovered by panopticon, as the address is edited into the _start prolog at static link time, e.g., main is at 0x400586, which is given to rdi (to be passed to __libc_start_main) in _start here:

Dump of assembler code for function _start:
   0x0000000000400490 <+0>: xor    %ebp,%ebp
   0x0000000000400492 <+2>: mov    %rdx,%r9
   0x0000000000400495 <+5>: pop    %rsi
   0x0000000000400496 <+6>: mov    %rsp,%rdx
   0x0000000000400499 <+9>: and    $0xfffffffffffffff0,%rsp
   0x000000000040049d <+13>:    push   %rax
   0x000000000040049e <+14>:    push   %rsp
   0x000000000040049f <+15>:    mov    $0x400620,%r8
   0x00000000004004a6 <+22>:    mov    $0x4005b0,%rcx
   0x00000000004004ad <+29>:    mov    $0x400586,%rdi
   0x00000000004004b4 <+36>:    callq  0x400470 <__libc_start_main@plt>
   0x00000000004004b9 <+41>:    hlt    

but panopticon only finds start and the plt jump stub for __libc_start_main. Of course we can cheat and try to use the strippable symbol table or debug information, but that only works when they're present.

Also, lack of discovery is sort of expected, as we'd need to know the function called with argument rdi expects a function pointer in order to guess that the mov $0x400586,%rdi is a function address, instead of a regular constant.

There are several approaches I think:

  1. (fragile, quicker to implement) Hard code some kind of pattern recognition for prologs that look like _start, and then hard-code a __libc_start_main esque pattern to know that the mov is main's address,
  2. (cooler, slower to implement, but future extensible) start working on function parameter type inference, to go from callq -> arguments -> (analyze function at callq address site) -> infer arguments -> check if any arguments are function pointer (e.g., somewhere in body of callq address site it is callq'd or jumped to in a "function-y" manner), and then disassemble the address at that pointer, and add to call targets as usual
  3. Another approach, e.g. maybe the abstract interpretation approach can help here?