Open DieracDelta opened 4 years ago
We may want equality.ml in bap_wp
not plugin
.
Thanks for the suggestions! I've made corresponding changes.
I agree--returning the mapping allows for more flexibility. I think the currying suggestion is clever--it makes exposing functions with aliased types easier to include in the mli
file. Specifically, this lets me also expose compare_blocks_syntax
. This function returns the syntactically equal (up to TID) blocks. Seems useful for #254 .
This mapper
interface is awesome. I'm wondering if there's an easy way to use it to rewrite sub_exp
in a cleaner fashion that doesn't involve fragile pattern matching. The issue is we need to carry around the information from both subs at the same time, and return something different if the subs don't match. This creates differing type signatures for the declared functions.
Also, I'm making liberal use of >>=
and >>|
. I could use let
syntax instead if that's clearer. Seems like the let
syntax is used in bildb, but the inline operators are used in wp
. I'm not sure what's preferred here.
I'm testing the changes by running them on grammatech. I'll post again when I've got results. EDIT: has the same behavior as before (marks 65 of the functions as syntactically equal).
There's no real norm between using monadic lets and >>=
s, though Ivan prefers the latter, and his opinion has some weight.
Apparently, using the ppx versions of let bindings is deprecated, instead using the (newish) feature that allows defining new let
-like keywords.
I'd suggest just using the combinators for now.
This PR adds in a check for syntactic equality (enabled with
--syntax-equality
). Example usage may be seen inrun_wp_*.sh
scripts in thesyntax_equality
test folder. This PR closes #211, and is a first step towards the ideas in #254. Specifically,equality.ml
exports anexist_isomorphism
function. This function checks if there's a mapping between the twosub.t
functions that preserves syntax structure, use of virtual variables (virtual variables may go by different names but must be used in all the same places), and control flow.I've documented the code in
equality.ml
andequality.mli
. I'm also including the results I posted to the slack for the run on grammatech. On average, only running the syntax check takes 8 seconds on my laptop. I found 65 syntactically equal functions on grammatech. When running CBAT, I got:So overall, the results that CBAT finishes within the timeout of 1000s turn out to match with this syntax check. It's not clear to me how to test the UNKNOWNs further.
This first plot is function (on the x axis) vs # syntactically idental blocks / total blocks in the sub. These are encouraging numbers performance improvements in #254 .
This second plot is time in seconds on the x axis vs # statements (def_t, jmp_t) on the y axis. The green dots are the run for syntactically equal functions (of which there are 65); the red dots are master branch cbat without any special flags. @philzook58 mentioned that this is big savings if there are a lot of long functions that are syntactically identical in the binary.