cmu-sei / pharos

Automated static analysis tools for binary programs
Other
1.54k stars 187 forks source link

Alleged invalid RTTI data and other inheritance problems #129

Open 0xBEEEF opened 4 years ago

0xBEEEF commented 4 years ago

Here is an example that uses a lot of inheritance (maybe not from practice, but theoretically possible). However, this small example leads to various problems in connection with analysis and further processing.

root@d7df564db58b:/usr/local/bin# ooanalyzer --json /dir/app.json  /dir/ConsoleApplication1.exe 
OPTI[INFO ]: Analyzing executable: /dir/ConsoleApplication1.exe
OPTI[INFO ]: OOAnalyzer version 1.0.
OPTI[INFO ]: ROSE stock partitioning took 3.94899 seconds.
OPTI[INFO ]: Partitioned 4663 bytes, 1496 instructions, 457 basic blocks, 0 data blocks and 121 functions.
OPTI[INFO ]: Pharos function partitioning took 4.66681 seconds.
OPTI[INFO ]: Partitioned 6149 bytes, 1768 instructions, 577 basic blocks, 19 data blocks and 166 functions.
APID[WARN ]: API database could not find function terminate in api-ms-win-crt-runtime-l1-1-0
OOAN[WARN ]: No stack delta information for: api-ms-win-crt-runtime-l1-1-0.dll:terminate
OOAN[ERROR]: No delete() methods were found.  Object analysis may be impaired.
OPTI[INFO ]: Function analysis complete, analyzed 115 functions in 3.21252 seconds.
OOAN[WARN ]: Missing this-pointer usage for new() call at 4014B8: call      40181B
OOAN[WARN ]: Unable to find parameter for new() call at 0x004014E4
OOAN[WARN ]: Missing this-pointer usage for new() call at 40165A: call      40181B
OOAN[WARN ]: Missing this-pointer usage for new() call at 40167C: call      40181B
OOAN[WARN ]: Missing this-pointer usage for new() call at 401764: call      40181B
OOAN[WARN ]: Missing this-pointer usage for new() call at 401782: call      40181B
PLOG[ERROR]: RTTI was invalid.
OOAN[ERROR]: Method address 0x00401170 was not a function or import.
OPTI[INFO ]: OOAnalyzer analysis complete, found: 8 classes, 16 methods, 0 virtual calls, and 86 usage instructions.
OPTI[INFO ]: Successfully exported to JSON file '/dir/app.json'.
OPTI[INFO ]: OOAnalyzer analysis complete.

It starts already with the analysis. Here the RTTI data is recognized as invalid, although I can import the example mentioned into Ghidra without problems and without errors.

Then it goes on that only a fraction of the analysis works. For example the VBTables are not recognized, also the subordinated VFTables are not recognized or only the very first one.

I admit that the example is a bit far-fetched. But theoretically you should be able to import the whole thing without serious errors.

The new() and delete() methods were not specially selected for this analysis. These errors can therefore be neglected.

To reproduce the case, simply compile the following example in release mode and then analyze it.

I would like to add to the result. The number of classes is correct in itself, but everything beyond that is not really. The best way to try this is to compile the example yourself in Visual Studio 2019 and do the whole process. You should get exactly the same result.

sei-eschwartz commented 4 years ago

Thanks. This type of example is really helpful for us.

0xBEEEF commented 4 years ago

Here still to the completeness all contained classes of the program, and how these were converted according to Visual C++ in assembler.

  class Base    size(4):
    +---
   0    | data_
    +---
  class Der1    size(12):
    +---
   0    | {vfptr}
   4    | {vbptr}
    +---
    +--- (virtual base Base)
   8    | data_
    +---
  Der1::$vftable@:
    | &Der1_meta
    |  0
   0    | &Der1::TestFunctionA
   1    | &Der1::TestFunctionB
  Der1::$vbtable@:
   0    | -4
   1    | 4 (Der1d(Der1+4)Base)
  Der1::TestFunctionA this adjustor: 0
  Der1::TestFunctionB this adjustor: 0
  vbi:     class  offset o.vbptr  o.vbte fVtorDisp
              Base       8       4       4 0
  class Der2    size(12):
    +---
   0    | {vfptr}
   4    | {vbptr}
    +---
    +--- (virtual base Base)
   8    | data_
    +---
  Der2::$vftable@:
    | &Der2_meta
    |  0
   0    | &Der2::TestFunctionD
  Der2::$vbtable@:
   0    | -4
   1    | 4 (Der2d(Der2+4)Base)
  Der2::TestFunctionD this adjustor: 0
  vbi:     class  offset o.vbptr  o.vbte fVtorDisp
              Base       8       4       4 0
  class Join    size(20):
    +---
   0    | +--- (base class Der1)
   0    | | {vfptr}
   4    | | {vbptr}
    | +---
   8    | +--- (base class Der2)
   8    | | {vfptr}
  12    | | {vbptr}
    | +---
    +---
    +--- (virtual base Base)
  16    | data_
    +---
  Join::$vftable@Der1@:
    | &Join_meta
    |  0
   0    | &Join::TestFunctionA
   1    | &Join::TestFunctionB
  Join::$vftable@Der2@:
    | -8
   0    | &Join::TestFunctionD
  Join::$vbtable@Der1@:
   0    | -4
   1    | 12 (Joind(Der1+4)Base)
  Join::$vbtable@Der2@:
   0    | -4
   1    | 4 (Joind(Der2+4)Base)
  Join::TestFunctionA this adjustor: 0
  Join::TestFunctionB this adjustor: 0
  Join::TestFunctionD this adjustor: 8
  vbi:     class  offset o.vbptr  o.vbte fVtorDisp
              Base      16       4       4 0
  class Base2   size(4):
    +---
   0    | data_
    +---
  class The1    size(36):
    +---
   0    | {vfptr}
   4    | {vbptr}
   8    | ?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@ data2_
    +---
    +--- (virtual base Base2)
  32    | data_
    +---
  The1::$vftable@:
    | &The1_meta
    |  0
   0    | &The1::NewTestFunctionA
   1    | &The1::NewTestFunctionB
  The1::$vbtable@:
   0    | -4
   1    | 28 (The1d(The1+4)Base2)
  The1::NewTestFunctionA this adjustor: 0
  The1::NewTestFunctionB this adjustor: 0
  vbi:     class  offset o.vbptr  o.vbte fVtorDisp
             Base2      32       4       4 0
  class The2    size(36):
    +---
   0    | {vfptr}
   8    | {vbptr}
  16    | data2_
        | <alignment member> (size=4)
  24    | data3_
    +---
    +--- (virtual base Base2)
  32    | data_
    +---
  The2::$vftable@:
    | &The2_meta
    |  0
   0    | &The2::NewTestFunctionD
  The2::$vbtable@:
   0    | -8
   1    | 24 (The2d(The2+8)Base2)
  The2::NewTestFunctionD this adjustor: 0
  vbi:     class  offset o.vbptr  o.vbte fVtorDisp
             Base2      32       8       4 0
  class Join2   size(100):
    +---
   0    | +--- (base class The1)
   0    | | {vfptr}
   4    | | {vbptr}
   8    | | ?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@ data2_
    | +---
  32    | +--- (base class The2)
  32    | | {vfptr}
  40    | | {vbptr}
  48    | | data2_
        | | <alignment member> (size=4)
  56    | | data3_
    | +---
  64    | myData
  68    | ?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@ string
  92    | floatingPoint
    +---
    +--- (virtual base Base2)
  96    | data_
    +---
  Join2::$vftable@The1@:
    | &Join2_meta
    |  0
   0    | &Join2::NewTestFunctionA
   1    | &Join2::NewTestFunctionB
  Join2::$vftable@The2@:
    | -32
   0    | &Join2::NewTestFunctionD
  Join2::$vbtable@The1@:
   0    | -4
   1    | 92 (Join2d(The1+4)Base2)
  Join2::$vbtable@The2@:
   0    | -8
   1    | 56 (Join2d(The2+8)Base2)
  Join2::NewTestFunctionA this adjustor: 0
  Join2::NewTestFunctionB this adjustor: 0
  Join2::NewTestFunctionD this adjustor: 32
  vbi:     class  offset o.vbptr  o.vbte fVtorDisp
             Base2      96       4       4 0
  class SuperJoin   size(124):
    +---
   0    | +--- (base class Join2)
   0    | | +--- (base class The1)
   0    | | | {vfptr}
   4    | | | {vbptr}
   8    | | | ?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@ data2_
    | | +---
  32    | | +--- (base class The2)
  32    | | | {vfptr}
  40    | | | {vbptr}
  48    | | | data2_
        | | | <alignment member> (size=4)
  56    | | | data3_
    | | +---
  64    | | myData
  68    | | ?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@ string
  92    | | floatingPoint
    | +---
    +---
    +--- (virtual base Base)
  96    | data_
    +---
  100   | (vtordisp for vbase Join)
    +--- (virtual base Join)
  104   | +--- (base class Der1)
  104   | | {vfptr}
  108   | | {vbptr}
    | +---
  112   | +--- (base class Der2)
  112   | | {vfptr}
  116   | | {vbptr}
    | +---
    +---
    +--- (virtual base Base2)
  120   | data_
    +---
  SuperJoin::$vftable@The1@:
    | &SuperJoin_meta
    |  0
   0    | &SuperJoin::NewTestFunctionA
   1    | &SuperJoin::NewTestFunctionB
  SuperJoin::$vftable@The2@:
    | -32
   0    | &SuperJoin::NewTestFunctionD
  SuperJoin::$vbtable@The1@:
   0    | -4
   1    | 116 (SuperJoind(The1+4)Base2)
   2    | 92 (SuperJoind(SuperJoin+4)Base)
   3    | 100 (SuperJoind(SuperJoin+4)Join)
  SuperJoin::$vbtable@The2@:
   0    | -8
   1    | 80 (SuperJoind(The2+8)Base2)
  SuperJoin::$vftable@Der1@:
    | -104
   0    | &(vtordisp) SuperJoin::TestFunctionA
   1    | &(vtordisp) SuperJoin::TestFunctionB
  SuperJoin::$vftable@Der2@:
    | -112
   0    | &(vtordisp) SuperJoin::TestFunctionD
  SuperJoin::$vbtable@Der1@:
   0    | -4
   1    | -12 (SuperJoind(Der1+4)Base)
  SuperJoin::$vbtable@Der2@:
   0    | -4
   1    | -20 (SuperJoind(Der2+4)Base)
  SuperJoin::NewTestFunctionA this adjustor: 0
  SuperJoin::NewTestFunctionB this adjustor: 0
  SuperJoin::NewTestFunctionD this adjustor: 32
  SuperJoin::TestFunctionA this adjustor: 104
  SuperJoin::TestFunctionB this adjustor: 104
  SuperJoin::TestFunctionD this adjustor: 112
  vbi:     class  offset o.vbptr  o.vbte fVtorDisp
              Base      96       4       8 0
              Join     104       4      12 1
             Base2     120       4       4 0
cfcohen commented 4 years ago

Thanks, your report has uncovered a number of bugs, and we've corrected some of them (public commit hopefully later today). You can find the cause of the "invalidity" with a command like this:

ooanalyzer --prolog-facts=/tmp/c.facts --prolog-results=/tmp/c.results --verbose=2 ConsoleApplication1.exe

which in this case reports (near the end of the execution):

RTTI Information is invalid because CompleteObjectLocator Offset2 = 0xc

The ooprolog.pl command should have emitted a warning too, but did not due to an unrelated bug. This is a case of us being intentionally over-constrained. I wanted to see examples of previously unseen flags in the CompleteObjectLocator RTTI data structures so that I could determine what the flags meant. The meaning of many of the flags fields are unknown as far as I can tell. :-( Here's the rule that triggered the "invalid" message:

https://github.com/cmu-sei/pharos/blob/master/share/prolog/oorules/rtti.pl#L215

Which obviously needs a clause that reads:

Offset2 \= 0xc,

(and probably one for 0x8 as well). So we know that 0xc is a valid value now, but we still don't know the meaning. I'll try to look at that some more soon, and I'll post here if I figure it out, but if you're able to understand the significance of 0xc in this field, that would be helpful. In general virtual inheritance and multiple inheritance did not receive as much testing as ordinary inheritance and other features. Our goal was to get the tool working for as many "common" cases as possible, and it's only recently that we've become much more serious about testing all of these unusual cases like you are doing. Thanks for your help with that!

0xBEEEF commented 4 years ago

Many thanks for the detailed answer! What I already noticed about multiple inheritance is that sometimes only the very first VFTable is recognized as such. All other VFTables are also used in the constructor, but they are always displayed as normal members (mbr_xyz). Ghidra shows them here in the listing, the pointers pointing to the affected VFTables. In my specific case, I'm still considering whether I should create this as a separate item, or that's enough if I add it somewhere in the open issues.

cfcohen commented 4 years ago

So it turns out that the Offset2 field is (surprise!) an OFFSET. I'm not sure how I ended up with code that attempts to validate this field against an expected value. Probably because the rule was written a long time ago. The most current analysis of this field seems to be from here:

https://github.com/cmu-sei/pharos/blob/master/share/prolog/oorules/facts.pl#L71

Where I've found that the field is apparently called the "constructor displacement offset", but I still haven't documented what that really means. SuperJoin is the only class that has a non-zero value in the CompleteObjectLocator fact:

rTTICompleteObjectLocator(0x403240, 0x403524, 0x405064, 0x403490, 0x70, 0xc).
rTTICompleteObjectLocator(0x4031d4, 0x403834, 0x405064, 0x403490, 0x68, 0x4).

Also regarding the BaseTable detection, these facts represent the tables (and the corresponding data from the compiler):

possibleVBTableWrite(0x4011e6, 0x4011a0, 0x4, 0x40321c). initialMemory(0x40321c, -0x4). initialMemory(0x403220, 0x74). initialMemory(0x403224, 0x5c). initialMemory(0x403228, 0x64).

SuperJoin::$vbtable@The1@: 0 | -4 1 | 116 (SuperJoind(The1+4)Base2) 2 | 92 (SuperJoind(SuperJoin+4)Base) 3 | 100 (SuperJoind(SuperJoin+4)Join)

possibleVBTableWrite(0x4011ed, 0x4011a0, 0x28, 0x403208). initialMemory(0x403208, -0x8). initialMemory(0x40320c, 0x50).

SuperJoin::$vbtable@The2@: 0 | -8 1 | 80 (SuperJoind(The2+8)Base2)

possibleVBTableWrite(0x4011f4, 0x4011a0, 0x6c, 0x4031e0). initialMemory(0x4031e0, -0x4). initialMemory(0x4031e4, -0xc).

SuperJoin::$vbtable@Der1@: 0 | -4 1 | -12 (SuperJoind(Der1+4)Base)

possibleVBTableWrite(0x4011fb, 0x4011a0, 0x74, 0x403270). initialMemory(0x403270, -0x4). initialMemory(0x403274, -0x14).

SuperJoin::$vbtable@Der2@: 0 | -4 1 | -20 (SuperJoind(Der2+4)Base)

So we've generated the facts required to detect these virtual base tables, but for some reason they weren't accepted in the final results. I'll investigate why now.

Perhaps the 0x4 and 0xc have some relation to the matching values in the SuperJoin::$vbtable@Der1@ at 0x4031e0?

cfcohen commented 4 years ago

So I've determined that the rules here are too restrictive for your example:

https://github.com/cmu-sei/pharos/blob/master/share/prolog/oorules/initial.pl#L99

Specifically, the first rule that reasons about additional entries in the table (after offset zero) failed for your test case. The problem seems to be that the FuncOffset clause on line 132 was not true. Perhaps this is because your constructors were inlined? This seems likely given the large number of VFTable and VBTable installations in the constructor at 0x40111a0, which I expected is SuperJoin.

possibleVBTableWrite(0x4011e6, 0x4011a0, 0x4, 0x40321c). possibleVBTableWrite(0x4011ed, 0x4011a0, 0x28, 0x403208). possibleVBTableWrite(0x4011f4, 0x4011a0, 0x6c, 0x4031e0). possibleVBTableWrite(0x4011fb, 0x4011a0, 0x74, 0x403270). possibleVFTableWrite(0x401202, 0x4011a0, 0x68, 0x4031f8). possibleVFTableWrite(0x401209, 0x4011a0, 0x70, 0x403204). possibleVFTableWrite(0x40122f, 0x4011a0, 0, 0x403230). possibleVFTableWrite(0x4012ab, 0x4011a0, 0x20, 0x40323c). possibleVFTableWrite(0x4012b7, 0x4011a0, 0, 0x403214). possibleVFTableWrite(0x4012bd, 0x4011a0, 0x20, 0x403264). possibleVFTableWrite(0x40133b, 0x4011a0, 0, 0x4031ec). possibleVFTableWrite(0x401341, 0x4011a0, 0x20, 0x40325c). possibleVFTableWrite(0x401352, 0x4011a0, 0x4, 0x4031d8). possibleVFTableWrite(0x401360, 0x4011a0, 0xc, 0x403244).

This is also an interesting case for perhaps considering the significance of the ordering of these writes...

It's also worth noting that even when we detect the VBTables correctly, it's not obvious how else it would improve the results. We don't really use the VBTables for much. I suppose we can prove certain inheritance relationships, but we've mostly obtained those directly from the RTTI information. We might also detect some object sizes better, but simply knowing that there writes into the object did most of that work (it doesn't really matter that they were VBTables). So it's unclear if failing to detect the VBTables is really related to whatever the real problems are...

0xBEEEF commented 4 years ago

Wow you've been really busy! These are really impressive facts about the program I use.

I think it's great that this information matches the information generated by the compiler.

Concerning your statement of the additional information about VBTables. I think that would have its reason. If you look at the other example of virtual inheritance in the other issue #130, there are accesses to abstract variables within the function H::access. Here the VBTables are used correctly and the values of objects are set correctly. Here this information would be incredibly important to understand the overall context.

For this reason I still think that you should pass on these VBTable structures as far as possible. In the further analysis this would help incredibly in my opinion. Maybe you could also pass on the offsets you mentioned above of the This Pointer in the structure, e.g. as a comment within the structure.

sei-eschwartz commented 3 years ago

Was this RTTI problem ever fixed?

0xBEEEF commented 3 years ago

So the originally reported problem with the unrecognized RTTI data is, but to be honest I haven't followed up on the VBTables issue yet. Therefore, I can no longer give an exact status on this.