draperlaboratory / cbat_tools

Program analysis tools developed at Draper on the CBAT project.
MIT License
102 stars 14 forks source link

Short circuit when functions are syntactically equal #259

Open DieracDelta opened 4 years ago

DieracDelta commented 4 years ago

This PR adds in a check for syntactic equality (enabled with --syntax-equality ). Example usage may be seen in run_wp_*.sh scripts in the syntax_equality test folder. This PR closes #211, and is a first step towards the ideas in #254. Specifically, equality.ml exports an exist_isomorphism function. This function checks if there's a mapping between the two sub.t functions that preserves syntax structure, use of virtual variables (virtual variables may go by different names but must be used in all the same places), and control flow.

I've documented the code in equality.ml and equality.mli. I'm also including the results I posted to the slack for the run on grammatech. On average, only running the syntax check takes 8 seconds on my laptop. I found 65 syntactically equal functions on grammatech. When running CBAT, I got:

So overall, the results that CBAT finishes within the timeout of 1000s turn out to match with this syntax check. It's not clear to me how to test the UNKNOWNs further.

This first plot is function (on the x axis) vs # syntactically idental blocks / total blocks in the sub. These are encouraging numbers performance improvements in #254 . image

This second plot is time in seconds on the x axis vs # statements (def_t, jmp_t) on the y axis. The green dots are the run for syntactically equal functions (of which there are 65); the red dots are master branch cbat without any special flags. @philzook58 mentioned that this is big savings if there are a lot of long functions that are syntactically identical in the binary.
image

philzook58 commented 4 years ago

We may want equality.ml in bap_wp not plugin.

DieracDelta commented 4 years ago

Thanks for the suggestions! I've made corresponding changes.

I agree--returning the mapping allows for more flexibility. I think the currying suggestion is clever--it makes exposing functions with aliased types easier to include in the mli file. Specifically, this lets me also expose compare_blocks_syntax. This function returns the syntactically equal (up to TID) blocks. Seems useful for #254 .

This mapper interface is awesome. I'm wondering if there's an easy way to use it to rewrite sub_exp in a cleaner fashion that doesn't involve fragile pattern matching. The issue is we need to carry around the information from both subs at the same time, and return something different if the subs don't match. This creates differing type signatures for the declared functions.

Also, I'm making liberal use of >>= and >>|. I could use let syntax instead if that's clearer. Seems like the let syntax is used in bildb, but the inline operators are used in wp. I'm not sure what's preferred here.

I'm testing the changes by running them on grammatech. I'll post again when I've got results. EDIT: has the same behavior as before (marks 65 of the functions as syntactically equal).

codyroux commented 4 years ago

There's no real norm between using monadic lets and >>=s, though Ivan prefers the latter, and his opinion has some weight.

Apparently, using the ppx versions of let bindings is deprecated, instead using the (newish) feature that allows defining new let-like keywords.

I'd suggest just using the combinators for now.