cmu-sei / pharos

Automated static analysis tools for binary programs
Other
1.56k stars 191 forks source link

Handle "logical this adjustments” #187

Open RolphWoggom opened 3 years ago

RolphWoggom commented 3 years ago

When analyzing the Simpsons: Hit & Run demo JSON generation fails.

Executable, ApiDB, facts and results uploaded here, please let me know if you need any of the other files.

The error is ERROR: Duplicate key: '0x5f5d60':

root@74e326b16809:/workdir# ooanalyzer --help | grep RevID
RevID: 2becf22aa64577199a68741104f8e969554337df

root@74e326b16809:/workdir# partition --serialize=Simpsons.exe.ser --maximum-memory=128000 Simpsons.exe 
OPTI[INFO ]: Analyzing executable: Simpsons.exe
OPTI[INFO ]: ROSE stock partitioning took 207.678 seconds.
OPTI[INFO ]: Partitioned 467145 bytes, 152562 instructions, 37265 basic blocks, 57 data blocks and 2992 functions.
OPTI[INFO ]: Function partitioning took 1197.05 seconds.
OPTI[INFO ]: Writing serialized data to "Simpsons.exe.ser".
OPTI[INFO ]: Writing serialized data took 103.497 seconds.
OPTI[INFO ]: Partitioned 2040290 bytes, 608293 instructions, 148467 basic blocks, 14348 data blocks and 18426 functions.

root@74e326b16809:/workdir# ooanalyzer --serialize=Simpsons.exe.ser --maximum-memory 128000 --prolog-facts=Simpsons.exe.facts --threads=4 --per-function-timeout=6000 --apidb simpsons-api.json Simpsons.exe
OPTI[INFO ]: Analyzing executable: Simpsons.exe
OPTI[INFO ]: OOAnalyzer version 1.0.
OPTI[INFO ]: Reading serialized data from "Simpsons.exe.ser".
OPTI[INFO ]: Reading serialized data took 52.1446 seconds.
OPTI[INFO ]: Partitioned 2040290 bytes, 608293 instructions, 148467 basic blocks, 14348 data blocks and 18426 functions.
OOAN[WARN ]: Instruction 50139E: jmp       [5013B0+eax*4] had incomplete successors.
[INFO ]: Unable to find this-pointer for function at 0x005B2C70
[INFO ]: Unable to find this-pointer for function at 0x0050C7B0
[INFO ]: Unable to find this-pointer for function at 0x0043B9E0
OOAN[ERROR]: No new() methods were found.  Heap objects may not be detected.
OOAN[ERROR]: No delete() methods were found.  Object analysis may be impaired.
OPTI[WARN ]: OOAnalyzer did not perform C++ class analysis.
OPTI[INFO ]: OOAnalyzer analysis complete.

root@74e326b16809:/workdir# awk -F\( '{print $1}' Simpsons.exe.facts | sort | uniq -c
      1 % Object fact exporting complete.
      1 % Prolog facts autogenerated by OOAnalyzer.
  24827 callParameter
  42404 callReturn
  50728 callTarget
  39794 callingConvention
      1 fileInfo
   5555 funcOffset
  13463 funcParameter
   9868 funcReturn
  14243 initialMemory
  29314 methodMemberAccess
   6946 noCallsAfter
   6962 noCallsBefore
     44 possibleVBTableWrite
   1889 possibleVFTableWrite
  12112 possibleVirtualFunctionCall
   1478 rTTIBaseClassDescriptor
   1224 rTTIClassHierarchyDescriptor
   1596 rTTICompleteObjectLocator
   1320 rTTITypeDescriptor
   1512 returnsSelf
    198 thisPtrAllocation
   1464 thisPtrOffset
  17962 thisPtrUsage
    117 thunk
    139 uninitializedReads

root@74e326b16809:/workdir# ooprolog --facts Simpsons.exe.facts --results Simpsons.exe.results --json Simpsons.exe.json --log-level=6 >Simpsons.exe.log
ERROR: Duplicate key: '0x5f5d60'
ERROR: In:
ERROR:   [20] with_output_to(<stream>(0x563577a27940),exportJSON)
ERROR:   [19] setup_call_catcher_cleanup(user:open('Simpsons.exe.json',write,<stream>(0x563577a27940)),user:with_output_to(<stream>(0x563577a27940),exportJSON),_460,user:close(<stream>(0x563577a27940))) at /usr/local/lib/swipl/boot/init.pl:619
ERROR:   [16] catch(user:exportJSONTo('Simpsons.exe.json'),error(duplicate_key('0x5f5d60'),context(...,_558)),user:(...,...)) at /usr/local/lib/swipl/boot/init.pl:537
ERROR:   [15] catch_with_backtrace('<garbage_collected>','<garbage_collected>','<garbage_collected>') at /usr/local/lib/swipl/boot/init.pl:587
ERROR: 
ERROR: Note: some frames are missing due to last-call optimization.
ERROR: Re-run your program in debug mode (:- debug.) to get more detail.
sei-eschwartz commented 3 years ago
[ '0x5f5d60':vftable{ ea:'0x5f5d60',
       entries:entries{ '0':vftentry{ demangled_name:'',
                      ea:'0x548460',
                      import:false,
                      name:"virt_meth_0x548460",
                      offset:0,
                      type:meth
                    }
              },
       length:1,
       vftptr:'0x4'
     },
  '0x5f5d60':vftable{ ea:'0x5f5d60',
       entries:entries{ '0':vftentry{ demangled_name:'',
                      ea:'0x548460',
                      import:false,
                      name:"virt_meth_0x548460",
                      offset:0,
                      type:meth
                    }
              },
       length:1,
       vftptr:'0x0'
     }
]
sei-eschwartz commented 3 years ago

The vftable in question is installed by 0x548110 and 0x548200, at different offsets. I think the oustanding question at this point is are these on the same class?

RolphWoggom commented 3 years ago

From comparing this to the PS2 version with symbols it seems that:

sei-eschwartz commented 3 years ago

So does it seem like 0x5f5d60 is legitimately installed at two different offsets in radSoundHalListener?

Can you get an object layout for radSoundHalListener from the PS2 version?

sei-eschwartz commented 3 years ago

Here is the class hierarchy courtesy of RTTI:

data:006433EC ; public struct radSoundHalListener /* mdisp:0 */ :
.data:006433EC ;   public struct IRadSoundHalListener /* mdisp:0 */ :
.data:006433EC ;     public struct IRefCount /* mdisp:0 */,
.data:006433EC ;   public struct radSoundObject /* mdisp:4 */ :
.data:006433EC ;     public class radRefCount /* mdisp:4 */ :
.data:006433EC ;       public class radObject /* mdisp:4 */ :
.data:006433EC ;         public class radBaseObject /* mdisp:4 */
.data:006433EC ; struct radSoundHalListener `RTTI Type Descriptor'

@sei-ccohen are a bit confused: Because of the negative offset accessed in 0x548110, we believed there was a virtual base involved. But according to the above, there is no virtual base on radSoundHalListener.

The key to understanding what is going on is understanding why 0x548110 thinks it is ok to reference the object at a negative offset. Most likely whatever is causing the offset difference is also causing us to get confused about the vftable being installed in two different offsets.

sei-eschwartz commented 3 years ago

We are still experimenting, but we were able to generate a negative offset in a virtual function. https://www.godbolt.org/z/feoKPMajx

Basically, this happens when a virtual function is only accessed from the second (or later?) base class, and the function accesses a member in the first base class. This needs a bit more thought, but it means that the object pointer for a virtual function in a derived class may not always be pointing at the start of the derived class!

RolphWoggom commented 3 years ago

This needs a bit more thought, but it means that the object pointer for a virtual function in a derived class may not always be pointing at the start of the derived class!

Seems like this is happening here:

Sounds like the problem encountered here was identified?

sei-eschwartz commented 3 years ago

Yes, I think that is what is happening there. We are thinking about the best way to fix it.

By the way, in the meantime, you should be able to manually remove the problematic vftables from the .results file to get the export to work.

RolphWoggom commented 3 years ago

Great news! And thanks for the tip. I was able to export by removing finalInheritance(0x5f5d60, 0x5f5c04, 0x4, 0x5f5d60, false). and finalInheritance(0x5f7a80, 0x5f5bf4, 0x4, 0x5f7a80, false).. The second one doesn't use a negative offset directly but is preceded by it (near 0x567614).

Here is the class hierarchy courtesy of RTTI:

data:006433EC ; public struct radSoundHalListener /* mdisp:0 */ :
.data:006433EC ;   public struct IRadSoundHalListener /* mdisp:0 */ :
.data:006433EC ;     public struct IRefCount /* mdisp:0 */,
.data:006433EC ;   public struct radSoundObject /* mdisp:4 */ :
.data:006433EC ;     public class radRefCount /* mdisp:4 */ :
.data:006433EC ;       public class radObject /* mdisp:4 */ :
.data:006433EC ;         public class radBaseObject /* mdisp:4 */
.data:006433EC ; struct radSoundHalListener `RTTI Type Descriptor'

What tool was used to create this? Is that an IDA thing?

sei-eschwartz commented 3 years ago

Glad to hear you were able to get the JSON export to work.

Yes, the class hierarchy is an IDA feature I recently discovered by accident. It adds the hierarchy as a comment above the RTTI Type Descriptors.

I haven't used it, but I think https://github.com/astrelsky/Ghidra-Cpp-Class-Analyzer is a similar capability for Ghidra.

sei-eschwartz commented 3 years ago

I recently found that Jan Gray talked about this:

Consider next S::rvf(), which overrides R::rvf(). Most implementations note that S::rvf() must have a hidden this parameter of type S*. Since R’s rvf vftable slot may be used when this call occurs:

((R)ps)->rvf(); // (((R)ps)->R::vfptr[1])((R)ps)

Most implementations add another thunk to convert the R passed to rvf into an S. Some also add an additional vftable entry to the end of S’s vftable to provide a way to call ps- rvf() without first converting to an R*. MSC++ avoids this by intentionally compiling S::rvf() so as to expect a this pointer which addresses not the S object but rather the R embedded instance within the S. (We call this “giving overrides the same expected address point as in the class that first introduced this virtual function”.) This is all done transparently, by applying a “logical this adjustment” to all member fetches, conversions from this, and so on, that occur within the member function. (Just as with multiple inheritance member access, this adjustment is constant-folded into other member displacement address arithmetic.)

sei-eschwartz commented 1 week ago

One thing I noticed recently is that there are actually stubs for logical this adjustments that we should be able to detect.

Here's an example for experimentation: https://godbolt.org/z/qs55rzaj6

image