GrammaTech / gtirb

Intermediate Representation for Binary analysis and transformation
https://grammatech.github.io/gtirb/
Other
305 stars 36 forks source link

Does the CFG from gtirb take account of C++ virtual functions and exception handling handlers? #57

Closed trcrsired closed 1 year ago

trcrsired commented 2 years ago

BTW, what should i do if i want to trace a constant that can reach for the CFG?

For example movq $3,%rax movq %rax,(%rsp) movq %rax,%rbx How can I trace all the code that $3 could reach for example?

tjohnson-gt commented 2 years ago

Technically, the CFG in gtirb is capable of modeling C++ virtual function calls - as indirect calls w/ an appropriate set of edges to all the possible targets. However, if you're using ddisasm, it does not currently attempt to find those edges, so a gtirb file produced by ddisasm will likely have a CFG where a C++ virtual function call is modeled as a CodeBlock w/ a single indirect call edge coming out of it that targets a ProxyBlock. (And similarly for other kinds of indirect function calls.) That edge indicates that we know an indirect call occurs there, but we don't know what the possible targets are for that call.

If you felt inspired, you could write your own analysis to figure out what the correct edges should be and then replace the edge to the ProxyBlock with those edges. There is substantial academic literature out there on doing this. Unfortunately, there's no silver bullet as each compiler can choose to do this differently (though there tend to be common patterns), and you can quite easily run into problems with pointer indirection throwing a wrench in the works.

The gtirb CFG includes no overt support for modeling C++ exception control flow in the abstract. What you'll get is a CFG that models the raw operations exceptional control flow is implemented with. That will vary depending on the platform the software is targeted at.

Regarding your question about reachability of numeric constants, I think this partly depends on what your end-goal is. Do you just want to do one particular trace for a specific constant? Find all such traces at once? Have some backing data structure that allows for efficient random queries for different constants? The range of possible solutions include: performing a kind of partial evaluation by walking forward on the CFG one step at a time and watching what happens with the constant; computing a taint analysis where you treat each constant as a different taint source and math operations as taint scrubbers; building a data dependence graph and the performing forward slices from constants, truncating the slice when math operations are performed. Note that again, pointer dereferences will give you a hard time if you care about following constants through memory and the fact that pointers and pointer arithmetic can create aliases to that memory. Pointer analysis is a very thorny problem.