BinaryAnalysisPlatform / bap

Binary Analysis Platform
MIT License
2.07k stars 273 forks source link

fixes and improves label handling in the IR theory #1333

Closed ivg closed 3 years ago

ivg commented 3 years ago

For the context, since the Ghidra PR the semantics of the [blk] operation in Core Theory was refined and corresponding theories were updated. Before #1326 all theories were just ignoring the labels passed to the [blk] operator. The reason for that was that the [blk] operation serves two purposes. It enables concatenation of data and control flow effects. And it attaches a label to that sequence so that it could be referenced elsewhere. Before #1326 it was only used for the first purpose, i.e., to merge data and control effects into the single effect. But for Ghidra we needed an ability to create labels (as ghidra is relying on Branch/Cbranch) instructions everywhere, even to express the intra-instruction logic, not real control flow.

Now, the theories have to take into account the label passed to the blk operation, when they produce their denotations, unless the label is Label.null. If the label is Label.null then the operation is denotes just a sequence of data and control flow effects. Moreover, denotations are allowed to coalesce several blocks together. But if the label is non-null then the denotation has to preserve it.

Before this PR the BIR theory wasn't fully respecting the passed labels and was sometimes optimizing them away, for example, when the label was attached to an empty denotation. This PR takes care of keeping the passed labels and at the same time preserving the minimal form of the generated IR. Of course assuming that lifters are using the blk operation correctly, i.e., that they are not passing non-null labels to blocks that they do not plan to invoke later.

In other words, if you have a lifter that uses the blk operation you need to update it and pass Theory.Label.null there instead of the fresh label. This will have the same semantics as it had before. Passing non-null label now has a different semantics.