cmu-sei / pharos

Automated static analysis tools for binary programs
Other
1.53k stars 186 forks source link

object arrays #30

Open Trass3r opened 5 years ago

Trass3r commented 5 years ago

msvc transforms something like

struct Foo
{
    Foo();
    ~Foo();

    int a[20] = {};
};
return new Foo[50];

into

t = (uint32_t*)operator new(4004);
if (!t)
    return 0;
*t = 50;
a = t + 1;
eh_vector_constructor(a, 80, 50, Foo::Foo, guard_check_icall_nop);
return a;

https://www.geoffchappell.com/studies/msvc/language/compgen/_j.htm

Looks like pharos can't handle that case yet.

This pattern can also arise inside of a constructor if you have an array of objects as part of another object: https://godbolt.org/z/Ml61jT

cfcohen commented 5 years ago

Yes. Arrays of objects were known to be a topic that requires improvement. Are you able to determine exactly what we don't handle? I assume we found almost nothing in this particular case. I would think that our code should by default treat it like a single instance of an object.

It's quite likely that at least part of the problem is that "Foo()" is a pretty thin class. Try adding some features to the class that will make OOAnalyzer's job easier, like a virtual method and an invocation of that method then see if we do better. That should help separate what parts of the problem are related to object arrays versus other problems in OOAnalyzer.

We might be missing the name for the new[] operator here:

https://github.com/cmu-sei/pharos/blob/master/libpharos/ooanalyzer.cpp#L294

Or the hash here:

https://github.com/cmu-sei/pharos/blob/master/libpharos/ooanalyzer.cpp#L259

I remember being very conflicted about including those when I didn't know what the impact would be.

I will look at this case some more if I have time, but hopefully I've given you enough to investigate some more if it's blocking you from accomplishing something.

Trass3r commented 5 years ago

It didn't recognize anything, just some exception and typeinfo classes. msvc forwards new[] to the new operator (unlike gcc). So it could either call that forwarding new[] or it's inlined and calls new directly. But yes it doesn't know the VS2019 new operator hash.

Giving it the new[] address manually results in this error: Missing return value from new() call at 0x0040104F

Adding a virtual method does help it to find the ctor, dtor and virtual method. But it does not understand the usage in main of course.

Trass3r commented 5 years ago

N.B.: eh_vector_constructor could also be inlined. https://godbolt.org/z/XsZnQc

sei-ccohen commented 4 years ago

Another great report. I really like they way you've used godbolt to demonstrate the issue by the way. We'll definitely want to look at this, but it's not something we're currently working on.

edmcman commented 3 years ago

Just to update this, I did add a bunch of hashes for new[]. But I suspect it will still have the Missing return value from new() call at 0x0040104F problem you encountered when specifying the address manually. At some point @sei-ccohen will probably need to look into that.

Trass3r commented 3 years ago

The problem is that the number passed to new is not the size of an individual object but the sum of all objects in the array + the number of objects (which can be dynamic of course). If this is not taken into account the tool may think it's one large object.

sei-eschwartz commented 3 years ago

Yes, that is a good point. We should probably ignore the size for the [] variants because there's no easy way to know how many objects there are (right?)

Trass3r commented 3 years ago

I've checked the 2 previously posted godbolt links again. The number of objects (size_t) is saved in front of the objects for msvc and gcc but only if virtual functions are present.

The rest of the code takes care of calling all the constructors which depends on the compiler optimizations. Without any msvc inserts a nice function call:

eh_vector_constructor(memory, fooSize, numFoos, Foo::Foo, guard_check_icall_nop);

but with opts this loop will be inlined and potentially also the ctor if visible.

edmcman commented 3 years ago

Thanks, I clearly had not looked at your examples in enough detail. At least for virtual classes we should have enough information to do something meaningful. I may work on this some over the holidays.

Trass3r commented 3 years ago

The pattern is also used for global/stack arrays, like Foo f[50]; instead of new Foo[50].