draperlaboratory / cbat_tools

Program analysis tools developed at Draper on the CBAT project.
MIT License
102 stars 14 forks source link

Partition the comparative analysis into logical blocks #322

Open codyroux opened 3 years ago

codyroux commented 3 years ago

To improve the performance/scalability of the comparative analysis, we want to break pairs of functions into pairs of sub-graphs of the CFGs which should have equivalent behaviors.

Following some insightful suggestions by Tristan Ravitch, we would like to proceed as follows:

Recursively descend into both CFGs from the starting point, and at each branch, create a new subroutine up to that point, and check equivalence of those subroutines. Then build a correspondence between branch conditions, so that we know how to match up the subsequent pairs of blocks.

For example, looking at a pair of functions

fun_orig()
    ...
    do_something1
    ...
    if(cond1){
        branch_1_1
      } else {
        branch_1_2
    }
    finish1

fun_mod()
    ...
    do_something2
    ...
    if(cond2){
        branch_2_1
      } else {
        branch_2_2
    }
    finish2

Then the algorithm might end up doing the following:

  1. Compare do_something1 and do_something2 for full program equivalence (registers and memory)
  2. Check that (after executing do_something1) cond1 => cond2 or alternately cond1 => !cond2
  3. In the first case, assume equal inputs, and check that branch_1_1; finish1 and branch_2_1;finish2 are fully equivalent; in the second case branch_1_1; finish1 and branch_2_2; finish2 etc.

Conclude that the two programs are fully equivalent.


Issues:

  1. This requires full memory equivalence, which we have noted can be very costly. Again, the suggestion here was to heuristically gather symbolic or concrete locations we believe are the writes, and check equivalence for only these locations, along with checking that writes are only indeed made at these locations.

  2. This does not easily allow for properties other than full equivalence. More thought is needed here, but perhaps we can handle equivalence of some initial segment, and simplify the comparative analysis by restricting to the non-equivalent sub-sections of the CFG.