cmu-sei / pharos

Automated static analysis tools for binary programs
Other
1.57k stars 192 forks source link

Calling conventions and being smarter about statically linked binaries #230

Open sei-eschwartz opened 2 years ago

sei-eschwartz commented 2 years ago

I have recently been reviewing the reasoning rules for deciding what functions are methods. Many of these methods are in the form of "Well, we passed this pointer as a thisptr to a known method, and we passed it to this candidate method, so the candidate method is probably a method". Why do we have these?

If we think about it, on 32-bit MSVC, excluding WPO, if we see an argument in ecx the function is either fastcall or thiscall. And the user probably isn't writing fastcall functions. Here's a short list of fastcall functions found in our test suite:

     15 _EH4_CallFilterFunc(x,
     15 _EH4_GlobalUnwind2(x,
     15 _EH4_LocalUnwind(x,
     15 _EH4_TransferToHandler(x,
     48 _RTC_AllocaHelper(x,
     48 _RTC_CheckStackVars2(x,
     48 _RTC_CheckStackVars(x,
     87 __security_check_cookie(x)

These are short and probably easily detectable. So excluding those, ecx argument usage means a method, right?

Nope. It turns out that even if you have WPO turned off, statically compiling an executable will include WPO functions in the binary, because libc was compiled with WPO turned on. So unfortunately you get weird stuff like cdecl functions that pass their arguments in the eax and ecx registers.

In short, we should be smarter about dealing with statically linked binaries. If we have a dynamically linked binary, we should be more liberal with how we detect methods. If we have a statically linked binary, we can use the current conservative behaviors, or maybe just fingerprint all the standard functions and exclude them.

The first step of doing this is to detect whether a binary is static or dynamic. To answer this, we can look at the executable's imports and see if the c++ runtime is there or not. And then export that as part of the facts file.